Hadoop 2.7 installation on Ubuntu 14.04 (Single Node)

Hadoop installation on Ubuntu

Installation Ubuntu on virtual machine

Here I am showing hadoop installation Ubuntu using VM player. Here is the vm setting of ubuntu

vm player settings

VM memory should be 25% – 35% of hosting machine.

Prerequisites

Before starting installation lets update all applications of software.

sudo apt-get update

Installing Java

if java is not installed, then use following command for installing java

sudo apt-get install openjdk-7-jdk

Or use ubuntu software center for installing jdk 6 or 7. Now check is it installed correctly

java-version

Adding a dedicated Hadoop system user

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser

Installing SSH

sudo apt-get install ssh

Generate an SSH key for the hduser user

ssh-keygen -t rsa -P ""

Here P “” indicates an empty password. Now needs to enable SSH access to local machine with this newly created key.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Test ssh

ssh-test

Disabling IPv6

We need to disable IPv6 because Ubuntu is using 0.0.0.0 IP for different Hadoop configurations. To do that open /etc/sysctl.conf file by gedit or nano and add following 3 lines at the end

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Need to reboot system to get this effect.

Install Hadoop

wget http://apache.arvixe.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
tar xvzf hadoop-2.7.1.tar.gz
mv hadoop-2.7.1 hadoop
sudo mv hadoop /usr/local/
sudo chown -R hduser:hadoop hadoop

The following files are needed to update single node Hadoop cluster

  1. core-site.xml
  2. mapred-site.xml
  3. hdfs-site.xml
  4. Update $HOME/.bashrc
  5. hadoop-env.sh

1.core-site.xml

The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties which is used by Hadoop. Need to add following property to this file

<configuration>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop/tmp</value>
  <description>temporary directories.</description>
 </property>

 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.</description>
 </property>
</configuration>

Now lets create the directory and set the required ownership and permissions

sudo mkdir -p /usr/local/hadoop/tmp
sudo chown hduser:hadoop /usr/local/hadoop/tmp

2. mapred-site.xml

By default, the /usr/local/hadoop/etc/hadoop/ folder contains
mapred-site.xml.template and need to rename as mapred-site.xml and enter following content

<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
 </property>
</configuration>

3. hdfs-site.xml

Here we will specify namenode and datanode directories, to do that we need to create folders

sudo mkdir -p /usr/local/hdfs/namenode
sudo mkdir -p /usr/local/hdfs/datanode
sudo chown -R hduser:hadoop /usr/local/hdfs

to add properties we can use vi or nano,

nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
 <property>
  <name>dfs.replication</name>
  <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hdfs/datanode</value>
 </property>
</configuration>

4. Update $HOME/.bashrc

nano-bash

add following values at the end of file

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"

5.hadoop-env.sh

Set JAVA_HOME by modifying /usr/local/hadoop/etc/hadoop/hadoop-env.sh file.

# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386

Formatting the HDFS filesystem

Switch to hduser

swithc-user

Format command:

hadoop namenode -format

Start Hadoop

start-hadoop

Hadoop Web Interfaces

http://localhost:50070/

web

Stop Hadoop

stop-all.sh

Enjoy!!! big data!!!

Leave a Reply

Your email address will not be published. Required fields are marked *