Custom Input Format in MapReduce

Custom Input Format: Before implementing Custom Input Format, please find the answer for what is Input Format. InputFormat describes the input-specification for a Map-Reduce job. (wiki) The Map-Reduce framework relies on the InputFormat of the job to: Validate the input-specification of the job. Split-up the input file(s) into logical InputSplits, each of which is then assigned to an individual Mapper. … More Custom Input Format in MapReduce

Hadoop Installation : 2.6.0 Part II

This post is continuation of Part I. Please check the Part I here. We have downloaded the Hadoop and configured the SSH as well. Now we are going to start with Hadoop configuration files. 3. /usr/local/hadoop/etc/hadoop/core-site.xml: The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting up. This file can be used to override the default … More Hadoop Installation : 2.6.0 Part II

Moving Big Data from Mainframe to Hadoop

A blog from Cloudera. Apache Sqoop provides a framework to move data between HDFS and relational databases in a parallel fashion using Hadoop’s MR framework. As Hadoop becomes more popular in enterprises, there is a growing need to move data from non-relational sources like mainframe datasets to Hadoop. Following are possible reasons for this: HDFS … More Moving Big Data from Mainframe to Hadoop

Hadoop vs Spark

Listen in on any conversation about big data, and you’ll probably hear mention of Hadoop or Apache Spark. Here’s a brief look at what they do and how they compare. 1: They do different things. Hadoop and Apache Spark are both big-data frameworks, but they don’t really serve the same purposes. Hadoop is essentially a … More Hadoop vs Spark

How does Hadoop process records split across block boundaries?

The logical records that FileInputFormats define do not usually fit neatly into HDFS blocks. For example, a TextInputFormat’s logical records are lines, which will cross HDFS boundaries more often than not. This has no bearing on the functioning of your program—lines are not missed or broken, for example—but it’s worth knowing about, as it does … More How does Hadoop process records split across block boundaries?