Hadoop Installation : 2.6.0 Part II

This post is continuation of Part I. Please check the Part I here.

We have downloaded the Hadoop and configured the SSH as well. Now we are going to start with Hadoop configuration files.

3. /usr/local/hadoop/etc/hadoop/core-site.xml:

The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting up.
This file can be used to override the default settings that Hadoop starts with.

hduser@laptop:~$ sudo mkdir -p /app/hadoop/tmp

hduser@laptop:~$ sudo chown hduser:hadoop /app/hadoop/tmp

Open the file and enter the following in between the <configuration></configuration> tag:

hduser@laptop:~$ nano /usr/local/hadoop/etc/hadoop/core-site.xml

 

<configuration>

 <property>

  <name>hadoop.tmp.dir</name>

  <value>/app/hadoop/tmp</value>

  <description>A base for other temporary directories.</description>

 </property>

 

 <property>

  <name>fs.default.name</name>

  <value>hdfs://localhost:54310</value>

  <description>The name of the default file system.  A URI whose

  scheme and authority determine the FileSystem implementation.  The

  uri’s scheme determines the config property (fs.SCHEME.impl) naming

  the FileSystem implementation class.  The uri’s authority is used to

  determine the host, port, etc. for a filesystem.</description>

 </property>

</configuration>

4. /usr/local/hadoop/etc/hadoop/mapred-site.xml

By default, the /usr/local/hadoop/etc/hadoop/ folder contains
/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
file which has to be renamed/copied with the name mapred-site.xml:

hduser@laptop:~$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

The mapred-site.xml file is used to specify which framework is being used for MapReduce.
We need to enter the following content in between the <configuration></configuration> tag:

<configuration>

 <property>

  <name>mapred.job.tracker</name>

  <value>localhost:54311</value>

  <description>The host and port that the MapReduce job tracker runs

  1. If “local”, then jobs are run in-process as a single map

  and reduce task.

  </description>

 </property>

</configuration>

5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml

The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each host in the cluster that is being used.
It is used to specify the directories which will be used as the namenode and thedatanode on that host.

Before editing this file, we need to create two directories which will contain the namenode and the datanode for this Hadoop installation.
This can be done using the following commands:

hduser@laptop:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode

hduser@laptop:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode

hduser@laptop:~$ sudo chown -R hduser:hadoop /usr/local/hadoop_store

Open the file and enter the following content in between the <configuration></configuration> tag:

hduser@laptop:~$ nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

 

<configuration>

 <property>

  <name>dfs.replication</name>

  <value>1</value>

  <description>Default block replication.

  The actual number of replications can be specified when the file is created.

  The default is used if replication is not specified in create time.

  </description>

 </property>

 <property>

   <name>dfs.namenode.name.dir</name>

   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>

 </property>

 <property>

   <name>dfs.datanode.data.dir</name>

   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>

 </property>

</configuration>

Format the New Hadoop Filesystem

Now, the Hadoop file system needs to be formatted so that we can start to use it. The format command should be issued with write permission since it creates currentdirectory
under /usr/local/hadoop_store/hdfs/namenode folder:

hduser@laptop:~$ hadoop namenode -format

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

 

15/04/18 14:43:03 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = laptop/192.168.1.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 2.6.0

STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop

STARTUP_MSG:   java = 1.7.0_65

************************************************************/

15/04/18 14:43:03 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]

15/04/18 14:43:03 INFO namenode.NameNode: createNameNode [-format]

15/04/18 14:43:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Formatting using clusterid: CID-e2f515ac-33da-45bc-8466-5b1100a2bf7f

15/04/18 14:43:09 INFO namenode.FSNamesystem: No KeyProvider found.

15/04/18 14:43:09 INFO namenode.FSNamesystem: fsLock is fair:true

15/04/18 14:43:10 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000

15/04/18 14:43:10 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true

15/04/18 14:43:10 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000

15/04/18 14:43:10 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Apr 18 14:43:10

15/04/18 14:43:10 INFO util.GSet: Computing capacity for map BlocksMap

15/04/18 14:43:10 INFO util.GSet: VM type       = 64-bit

15/04/18 14:43:10 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB

15/04/18 14:43:10 INFO util.GSet: capacity      = 2^21 = 2097152 entries

15/04/18 14:43:10 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false

15/04/18 14:43:10 INFO blockmanagement.BlockManager: defaultReplication         = 1

15/04/18 14:43:10 INFO blockmanagement.BlockManager: maxReplication             = 512

15/04/18 14:43:10 INFO blockmanagement.BlockManager: minReplication             = 1

15/04/18 14:43:10 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2

15/04/18 14:43:10 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false

15/04/18 14:43:10 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000

15/04/18 14:43:10 INFO blockmanagement.BlockManager: encryptDataTransfer        = false

15/04/18 14:43:10 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000

15/04/18 14:43:10 INFO namenode.FSNamesystem: fsOwner             = hduser (auth:SIMPLE)

15/04/18 14:43:10 INFO namenode.FSNamesystem: supergroup          = supergroup

15/04/18 14:43:10 INFO namenode.FSNamesystem: isPermissionEnabled = true

15/04/18 14:43:10 INFO namenode.FSNamesystem: HA Enabled: false

15/04/18 14:43:10 INFO namenode.FSNamesystem: Append Enabled: true

15/04/18 14:43:11 INFO util.GSet: Computing capacity for map INodeMap

15/04/18 14:43:11 INFO util.GSet: VM type       = 64-bit

15/04/18 14:43:11 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB

15/04/18 14:43:11 INFO util.GSet: capacity      = 2^20 = 1048576 entries

15/04/18 14:43:11 INFO namenode.NameNode: Caching file names occuring more than 10 times

15/04/18 14:43:11 INFO util.GSet: Computing capacity for map cachedBlocks

15/04/18 14:43:11 INFO util.GSet: VM type       = 64-bit

15/04/18 14:43:11 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB

15/04/18 14:43:11 INFO util.GSet: capacity      = 2^18 = 262144 entries

15/04/18 14:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033

15/04/18 14:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0

15/04/18 14:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000

15/04/18 14:43:11 INFO namenode.FSNamesystem: Retry cache on namenode is enabled

15/04/18 14:43:11 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis

15/04/18 14:43:11 INFO util.GSet: Computing capacity for map NameNodeRetryCache

15/04/18 14:43:11 INFO util.GSet: VM type       = 64-bit

15/04/18 14:43:11 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB

15/04/18 14:43:11 INFO util.GSet: capacity      = 2^15 = 32768 entries

15/04/18 14:43:11 INFO namenode.NNConf: ACLs enabled? false

15/04/18 14:43:11 INFO namenode.NNConf: XAttrs enabled? true

15/04/18 14:43:11 INFO namenode.NNConf: Maximum size of an xattr: 16384

15/04/18 14:43:12 INFO namenode.FSImage: Allocated new BlockPoolId: BP-130729900-192.168.1.1-1429393391595

15/04/18 14:43:12 INFO common.Storage: Storage directory /usr/local/hadoop_store/hdfs/namenode has been successfully formatted.

15/04/18 14:43:12 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

15/04/18 14:43:12 INFO util.ExitUtil: Exiting with status 0

15/04/18 14:43:12 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at laptop/192.168.1.1

************************************************************/
Note that hadoop namenode -format command should be executed once before we start using Hadoop.
If this command is executed again after Hadoop has been used, it’ll DESTROY all the data on the Hadoop file system.

Starting Hadoop

Now it’s time to start the newly installed single node cluster.
We can use start-all.sh or (start-dfs.sh and start-yarn.sh)

k@laptop:~$ cd /usr/local/hadoop/sbin

 

k@laptop:/usr/local/hadoop/sbin$ ls

distribute-exclude.sh    start-all.cmd        stop-balancer.sh

hadoop-daemon.sh         start-all.sh         stop-dfs.cmd

hadoop-daemons.sh        start-balancer.sh    stop-dfs.sh

hdfs-config.cmd          start-dfs.cmd        stop-secure-dns.sh

hdfs-config.sh           start-dfs.sh         stop-yarn.cmd

httpfs.sh                start-secure-dns.sh  stop-yarn.sh

kms.sh                   start-yarn.cmd       yarn-daemon.sh

mr-jobhistory-daemon.sh  start-yarn.sh        yarn-daemons.sh

refresh-namenodes.sh     stop-all.cmd

slaves.sh                stop-all.sh

 

k@laptop:/usr/local/hadoop/sbin$ sudo su hduser

 

hduser@laptop:/usr/local/hadoop/sbin$ start-all.sh

hduser@laptop:~$ start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

15/04/18 16:43:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Starting namenodes on [localhost]

localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-laptop.out

localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-laptop.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-laptop.out

15/04/18 16:43:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

starting yarn daemons

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-laptop.out

localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-laptop.out

We can check if it’s really up and running:

hduser@laptop:/usr/local/hadoop/sbin$ jps

9026 NodeManager

7348 NameNode

9766 Jps

8887 ResourceManager

7507 DataNode

7350 Secondary Namenode

The output means that we now have a functional instance of Hadoop running on our VPS (Virtual private server).

Another way to check is using netstat:

hduser@laptop:~$ netstat -plten | grep java

(Not all processes could be identified, non-owned process info

 will not be shown, you would have to be root to see it all.)

tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN      1001       1843372     10605/java     

tcp        0      0 127.0.0.1:54310         0.0.0.0:*               LISTEN      1001       1841277     10447/java     

tcp        0      0 0.0.0.0:50090           0.0.0.0:*               LISTEN      1001       1841130     10895/java     

tcp        0      0 0.0.0.0:50070           0.0.0.0:*               LISTEN      1001       1840196     10447/java     

tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN      1001       1841320     10605/java     

tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN      1001       1841646     10605/java     

tcp6       0      0 :::8040                 :::*                    LISTEN      1001       1845543     11383/java     

tcp6       0      0 :::8042                 :::*                    LISTEN      1001       1845551     11383/java     

tcp6       0      0 :::8088                 :::*                    LISTEN      1001       1842110     11252/java     

tcp6       0      0 :::49630                :::*                    LISTEN      1001       1845534     11383/java     

tcp6       0      0 :::8030                 :::*                    LISTEN      1001       1842036     11252/java     

tcp6       0      0 :::8031                 :::*                    LISTEN      1001       1842005     11252/java     

tcp6       0      0 :::8032                 :::*                    LISTEN      1001       1842100     11252/java      

tcp6       0      0 :::8033                 :::*                    LISTEN      1001       1842162     11252/java     

Stopping Hadoop

$ pwd

/usr/local/hadoop/sbin

 

$ ls

distribute-exclude.sh  httpfs.sh                start-all.sh         start-yarn.cmd    stop-dfs.cmd        yarn-daemon.sh

hadoop-daemon.sh       mr-jobhistory-daemon.sh  start-balancer.sh    start-yarn.sh     stop-dfs.sh         yarn-daemons.sh

hadoop-daemons.sh      refresh-namenodes.sh     start-dfs.cmd        stop-all.cmd      stop-secure-dns.sh

hdfs-config.cmd        slaves.sh                start-dfs.sh         stop-all.sh       stop-yarn.cmd

hdfs-config.sh         start-all.cmd            start-secure-dns.sh  stop-balancer.sh  stop-yarn.sh

We run stop-all.sh or (stop-dfs.sh and stop-yarn.sh) to stop all the daemons running on our machine:

hduser@laptop:/usr/local/hadoop/sbin$ pwd

/usr/local/hadoop/sbin

hduser@laptop:/usr/local/hadoop/sbin$ ls

distribute-exclude.sh  httpfs.sh                start-all.cmd      start-secure-dns.sh  stop-balancer.sh    stop-yarn.sh

hadoop-daemon.sh       kms.sh                   start-all.sh       start-yarn.cmd       stop-dfs.cmd        yarn-daemon.sh

hadoop-daemons.sh      mr-jobhistory-daemon.sh  start-balancer.sh  start-yarn.sh        stop-dfs.sh         yarn-daemons.sh

hdfs-config.cmd        refresh-namenodes.sh     start-dfs.cmd      stop-all.cmd         stop-secure-dns.sh

hdfs-config.sh         slaves.sh                start-dfs.sh       stop-all.sh          stop-yarn.cmd

hduser@laptop:/usr/local/hadoop/sbin$

hduser@laptop:/usr/local/hadoop/sbin$ stop-all.sh

This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh

15/04/18 15:46:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Stopping namenodes on [localhost]

localhost: stopping namenode

localhost: stopping datanode

Stopping secondary namenodes [0.0.0.0]

0.0.0.0: no secondarynamenode to stop

15/04/18 15:46:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

stopping yarn daemons

stopping resourcemanager

localhost: stopping nodemanager

no proxyserver to stop

 

 

Advertisements

One thought on “Hadoop Installation : 2.6.0 Part II

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s