Hadoop Code

Hadoop Code:

        How to Install hadoop in your system? Hadoop-code Steps involved in installing hadoop to your systems.There           are  13  hadoop  code steps involved in installing hadoop systems.

Hadoop-Code

Hadoop-Code

Hadoop Solutions offers ,hadoop projects for students,hadoop project,Hadoop code  big data projects,big data projecthadoop project ideas,sample hadoop projects,mapreduce projects,project idea with hadoop mapreduce,hadoop mapreduce projects,mapreduce project ideas,hadoop mapreduce projects,hadoop mapreduce project,projects on hadoop,hadoop project topics,hadoop research projects,big data hadoop projects,hadoop projects ideas,hadoop based projects,hadoop related projects,projects in hadoop,projects using hadoop,projects based on hadoop.

Hadoop version 2.0.4 on a single machine (server,workstation or hefty laptop).
Apache Hadoop YARN system has two major core components:
▪ HDFS acronym for Hadoop Distributed File System which is used for storing data.
▪ Hadoop YARN is been used for implementing data to process applications.
Other Apache Hadoop components, such as Hive or Pig, that can be added after the
two components are installed and operating properly.

Steps involved in a Single Node YARN Server:s

Minimum System Requirement : Dual-core processor with 2 GBs of RAM and 2 GBs of available hard drive space.
Software Already required : Fedora, Suse Linux Enterprise,Red Hat Enterprise Linux or OpenSuse)
Operating System : Linux.

Red Hat Enterprise Linux version(6.3) is used for this installation example. A bash shell
environment is also assumed. The first step is to download Apache Hadoop.

YARN Quick Start
Step 1: Download Apache Hadoop
Download the latest distribution from the Hadoop web site (http://hadoop.apache.org/).
For example, as root do the following:
# cd /root
# wget http://mirrors.ibiblio.org/apache/hadoop/common/hadoop-2.0.4-alpha/hadoop-2.0.4-
alpha.tar.gz
Next create and extract the package in /opt/yarn:
# mkdir /opt/yarn
# cd /opt/yarn
# tar xvzf /root/hadoop-2.0.4-alpha.tar.gz
If the archive was extracted correctly, the following directory structure should be under
/opt/yarn/hadoop-2.0.4-alpha.
(Note : Depending on the source distribution, your version may be different depenending upon which version of Apache Hadoop you are downloading.)
etc/
+ hadoop
include/
lib/
+ native
libexec/
sbin/
share/
+ doc
+ . . .
+ hadoop
+ . . .
The rest of these steps will create a basic single machine YARN installation.

Step 2: Set JAVA_HOME

For Hadoop 2, the recommended version of Java can be found at
http://wiki.apache.org/hadoop/HadoopJavaVersions. As mentioned, Red Hat Enterprise Linux
6.3 is the base installation which includes Open Java 1.6.0_24. Make sure the java-1.6.0-
openjdk RPM is installed. In order to include JAVA_HOME for all bash users (others shells
must be set in a similar fashion) make an entry in /etc/profile.d:
# echo “export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/” >
/etc/profile.d/java.sh
To make sure JAVA_HOME is defined for this session, source the new script:
# source /etc/profile.d/java.sh

Other Linux distributions may differ, and the steps that follow will need to be adjusted.
Step 3: Create Users and Groups
It is best to run the various daemons with individual separate accounts.
Three accounts (hdfs,yarn,mapred) in group hadoop can be created as follows:
# groupadd hadoop
# useradd -g hadoop yarn
# useradd -g hadoop hdfs
# useradd -g hadoop mapred

Step 4: Make Data and Log Directories
Hadoop needs various data and log directories with various permissions.
To create these directories,Enter the following :
# mkdir -p /var/data/hadoop/hdfs/nn
# mkdir -p /var/data/hadoop/hdfs/snn
# mkdir -p /var/data/hadoop/hdfs/dn
# chown hdfs:hadoop /var/data/hadoop/hdfs -R
# chown yarn:hadoop /var/log/hadoop/yarn -R
Next, move on-to the YARN installation root and initiate the log directory and mark the owner and
group as follows:
# cd /opt/yarn/hadoop-2.0.4-alpha
# mkdir logs# chmod g+w logs# chown yarn:hadoop . -R

Step 5: Configure core-site.xml

From the base of the Hadoop installation path (e.g., /opt/yarn/hadoop-2.0.4-alpha/), edit the
etc/hadoop/core-site.xml file. The original installed file will have no entrees other than the
<configuration> </configuration> tags. There are two properties that need to be corrected. The
first is the fs.default.name property that sets the host and request port name for the NameNode
(Metadata server for HDFS). The second is hadoop.http.staticuser.user, which will set the
default user name to hdfs. Copy the following lines to the Hadoop etc/hadoop/core-site.xml file
and remove the original empty <configuration> </configuration> tags.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hdfs</value>
</property>
</configuration>
3
4

YARN Quick Start

Step 6: Configure hdfs-site.xml
From the base of the Hadoop installation path, edit the etc/hadoop/hdfs-site.xml file. In the
single node pseudo distributed mode, we don’t need or require the HDFS to replicate file blocks.

From the base of the Hadoop installation path, edit the etc/hadoop/hdfs-site.xml file. In the
single node pseudo distributed mode, we don’t need or want the HDFS to replicate file blocks.
By default, HDFS keeps three copies of each file in the filesystem. There is no need for
replication on a single machine, thus the dfs.replication value will be set to one.
In hdfs-site.xml, we specify the NameNode, Secondary NameNode, and DataNode data
directories that we created in Step 4. These are the directories used by the various components
of HDFS to store data. Copy the following into Hadoop etc/hadoop/hdfs-site.xml and remove
the original empty <configuration> </configuration> tags.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/var/data/hadoop/hdfs/nn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:/var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:/var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/var/data/hadoop/hdfs/dn</value>
</property>
</configuration>

Step 7: Configure mapred-site.xml
From the base of the Hadoop installation, edit the etc/hadoop/mapred-site.xml file. A new
configuration option for Hadoop 2 is the capability to specify a framework name for
MapReduce, setting the mapreduce.framework.name property. In this install we will use the
value of “yarn” to tell MapReduce that it will run as a YARN application. First, copy the
template file to the mapred-site.xml.
# cp mapred-site.xml.template mapred-site.xml
Next, copy the following into Hadoop etc/hadoop/mapred-site.xml file and remove the original
empty <configuration> </configuration> tags.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Step 8: Configure yarn-site.xml
From the base of the Hadoop installation, edit the etc/hadoop/yarn-site.xml file. The
yarn.nodemanager.aux-services property tells NodeManagers that there will be an auxiliary
service called mapreduce.shuffle that it needs to implement. After we tell the NodeManagers to
implement that service, we give it a class name as the means to implement that service. In this
case, it’s the yarn.nodemanager.aux-services.mapreduce.shuffle.class. Specifically,
what this particular configuration does is tell MapReduce how to do its shuffle. Because
NodeManagers won’t shuffle data for a non-MapReduce job by default, we need to configure
such a service for MapReduce. Copy the following to the Hadoop etc/hadoop/yarn-site.xml file
and remove the original empty <configuration> </configuration> tags.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Step 9: Modify Java Heap Sizes

The Hadoop installation has various environment variables that determine the heap sizes for
each Hadoop process. These are defined in the etc/hadoop/*-env.sh files used by Hadoop.
The default for most of the processes is a 1GB heap size, but since we’re running on a
workstation that will probably have limited resources compared to a standard server, we need to
adjust the heap size settings. The values that follow are what are adequate for a small
workstation or server. They can be adjusted to fit your machine.
Edit etc/hadoop/hadoop-env.sh file to reflect the following (Don’t forget to remove the “#” at
the beginning of the line.):
HADOOP_HEAPSIZE=500
HADOOP_NAMENODE_INIT_HEAPSIZE=”500″
Next, edit the mapred-env.sh to reflect the following:
HADOOP_JOB_HISTORYSERVER_HEAPSIZE=250
Finally, edit yarn-env.sh to reflect the following:
JAVA_HEAP_MAX=-Xmx500m
The following will need to added to yarn-env.sh
YARN_HEAPSIZE=500

Step 10: Format HDFS

In order for the HDFS NameNode to start, it needs to initialize the directory where it will hold
its data. The NameNode service tracks all the meta-data for the filesystem. The format process
will use the value assigned to dfs.namenode.name.dir in etc/hadoop/hdfs-site.xml earlier (i.e.,
/var/data/hadoop/hdfs/nn). Formatting destroys everything in the directory and sets up a new
file system. Format the NameNode directory as the HDFS superuser, which is typically the
‘hdfs’ user account.
From the base of the Hadoop distribution, change directories to the ‘bin’ directory and execute
the following commands.
# su – hdfs
$ cd /opt/yarn/hadoop-2.0.4-alpha/bin
$./hdfs namenode -format
If the command worked, you should see the following near the end of a long list of messages:
INFO common.Storage: Storage directory /var/data/hadoop/hdfs/nn has been successfully
formatted.

Step 11: Start the HDFS Services

Once formatting is successful, the HDFS services must be started. There is one for the
NameNode (metadata server), a single DataNode (where the actual data is stored), and the
SecondaryNameNode (checkpoint data for the NameNode). The Hadoop distribution includes
scripts that set up these commands as well naming various other values like PID directories, log
directories, and other standard process configurations. From the sbin directory from Step 10
execute the following as user hdfs:
$ cd ../sbin
$ ./hadoop-daemon.sh start namenode starting namenode, logging to /opt/yarn/hadoop-
2.0.4-alpha/logs/hadoop-hdfs-namenode-limulus.out
$ ./hadoop-daemon.sh start secondarynamenodestarting secondarynamenode, logging to
/opt/yarn/hadoop-2.0.4-alpha/logs/hadoop-hdfs-secondarynamenode-limulus.out
$ ./hadoop-daemon.sh start datanodestarting datanode, logging to /opt/yarn/hadoop-
2.0.4-alpha/logs/hadoop-hdfs-datanode-limulus.out
If the daemon started, you should see responses above that will point to the log file. (Note that
the actual log file is appended with “.log” not “.out.”) . As a sanity check, issue a jps command
to see that all the services are running. The actual PID values will be different than shown in
this listing:
$ jps
15140 SecondaryNameNode
15015 NameNode
15335 Jps
15214 DataNode
If the process did not start, it may be helpful to inspect the log files. For instance, examine the
log file for the NameNode. (Note that the path is taken from the command above.)
vi /opt/yarn/hadoop-2.0.4-alpha/logs/hadoop-hdfs-namenode-limulus.log
All Hadoop services can be stopped using the hadoop-daemon.sh script. For example, to stop
the datanode service enter the following:
$ ./hadoop-daemon.sh stop datanode
The same can be done for the Namenode and SecondaryNameNode

Step 12: Start YARN Services

As with HDFS services, the YARN services need to be started. One ResourceManager and one
NodeManager must be started as user yarn:
# su – yarn
$ cd /opt/yarn/hadoop-2.0.4-alpha/sbin
$ ./yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/yarn/hadoop-2.0.4-alpha/logs/yarn-yarn-
resourcemanager-limulus.out
$ ./yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/yarn/hadoop-2.0.4-alpha/logs/yarn-yarn-
nodemanager-limulus.out
As with starting the HDFS daemons in the previous step, the status of running daemons is sent
to the respective log files. To check whether the services are running issue a jps command.
The following shows all the necessary services to run YARN on a single server:
$ jps
15933 Jps
15567 ResourceManager
15785 NodeManager
In if there are missing services, check the log file for the specific service.. Similar to HDFS, the
services can be stopped by issuing a stop argument to the daemon script:
./yarn-daemon.sh stop nodemanager

Step 13: Verify the Running Services Using the Web Interface

Both HDFS and the YARN Resource Manager have a web interface. These interfaces are a
convenient way to browse many of the aspects of your Hadoop installation. To monitor HDFS
enter the following:
$ firefox
http://localhost:50070
Connecting to port 50070 will bring up the web interface similar to what is shown in Figure
1.1.
YARN Components:
Resource Manager
ApplicationMaster
Resource Model
ResourceRequest and Containers

Save


Work Progress

PHD - 24

M.TECH - 125

B.TECH -95

BIG DATA -110.

HADOOP -90.

ON-GOING Hadoop Projects

HADOOP MAP -90.

HADOOP YARN -27.

HADOOP HEBROS - 25.

HADOOP ZOOKEEPER -18.

Achievements – Hadoop Solutions

Hadoop-Projects-Achievement-Awards

Twitter Feed

Customer Review

Hadoop Solutions 5 Star Rating: Recommended 4.9 - 5 based on 1000+ ratings. 1000+ user reviews.