From Fedora Project Wiki
 
(4 intermediate revisions by the same user not shown)
Line 12: Line 12:
Install Hadoop:
Install Hadoop:
  # dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn*
  # dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn*
Set the JAVA_HOME environment variable within the Hadoop configuration file (the default does not seem to work)
Set the JAVA_HOME environment variable within the Hadoop and YARN configuration files (the default files do not seem to work)
  # vi /etc/hadoop/hadoop-env.sh
  # vi /etc/hadoop/hadoop-env.sh; vi /etc/hadoop/yarn-env.sh
For instance, with [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Oracle Java JDK 8], the line should read something like:
For instance, with the Open JDK, the line should read something like:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-15.b17.fc23.x86_64 # On a Fedora 23
Or with [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Oracle Java JDK 8], the line would become:
  export JAVA_HOME=/usr/java/jdk1.8.0_51
  export JAVA_HOME=/usr/java/jdk1.8.0_51
You may want to adjust the amount of memory and the number of cores for the YARN cluster,
by adding the following lines to <tt>/etc/hadoop/yarn-site.xml</tt> (derived from [http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml yarn-default.xml]):
  <property>
    <description>Number of CPU cores that can be allocated for containers.</description>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>2</value>
  </property>
  <property>
    <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
  </property>
  <property>
    <description>The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.</description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
  </property>
Format the name-node:
Format the name-node:
  # runuser hdfs -s /bin/bash /bin/bash -c "hadoop namenode -format"
  # runuser hdfs -s /bin/bash /bin/bash -c "hdfs namenode -format"
which should produce something like:
which should produce something like:
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
  15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG:  
  15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG:  
  /************************************************************
  /************************************************************
Line 45: Line 63:
  ************************************************************/
  ************************************************************/
Start the Hadoop services:
Start the Hadoop services:
  # systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager
  # systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs
Check that the Hadoop services have been started:
Check that the Hadoop services have been started:
  # systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager
  # systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs
Enable the Hadoop services permanently, in case everything went smoothly:
Enable the Hadoop services permanently, in case everything went smoothly:
  # systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager
  # systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs
Create the default HDFS directories:
Create the default HDFS directories:
  # hdfs-create-dirs
  # hdfs-create-dirs
Web UI:
* Node: http://localhost:8042
* Resource Manager (RM): http://localhost:8088


== Setting Up a User's Sandbox (as root) ==
== Setting Up a User's Sandbox (as root) ==
Line 61: Line 82:
  $ git clone https://github.com/timothysc/hadoop-tests.git
  $ git clone https://github.com/timothysc/hadoop-tests.git
Once it has downloaded you can put the example .txt file into your user location
Once it has downloaded you can put the example .txt file into your user location
  $ cd hadoop-tests/WordCount
  $ cd WordCount
  $ hadoop fs -put constitution.txt /user/build
  $ hadoop fs -put constitution.txt /user/build
Now you can build WordCount against the system installed .jars.
Now you can build WordCount against the system installed .jars.

Latest revision as of 21:17, 10 January 2016

Denis Arnaud's page >

Overview

Bootstrapping Hadoop on Fedora for Fedora 22+.

See Also

Installation and Setup (as root)

Install Hadoop:

# dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn*

Set the JAVA_HOME environment variable within the Hadoop and YARN configuration files (the default files do not seem to work)

# vi /etc/hadoop/hadoop-env.sh; vi /etc/hadoop/yarn-env.sh

For instance, with the Open JDK, the line should read something like:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-15.b17.fc23.x86_64 # On a Fedora 23

Or with Oracle Java JDK 8, the line would become:

export JAVA_HOME=/usr/java/jdk1.8.0_51

You may want to adjust the amount of memory and the number of cores for the YARN cluster, by adding the following lines to /etc/hadoop/yarn-site.xml (derived from yarn-default.xml):

 <property>
   <description>Number of CPU cores that can be allocated for containers.</description>
   <name>yarn.nodemanager.resource.cpu-vcores</name>
   <value>2</value>
 </property>
 <property>
   <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
   <name>yarn.nodemanager.resource.memory-mb</name>
   <value>2048</value>
 </property>
 <property>
   <description>The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.</description>
   <name>yarn.scheduler.maximum-allocation-mb</name>
   <value>2048</value>
 </property>

Format the name-node:

# runuser hdfs -s /bin/bash /bin/bash -c "hdfs namenode -format"

which should produce something like:

15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = myhost.mydomain/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.4.1
STARTUP_MSG:   classpath = /etc/hadoop:/usr/share/hadoop/common/lib/asm-tree-5.0.3.jar:[...]
STARTUP_MSG:   build = Unknown -r Unknown; compiled by 'mockbuild' on 2015-04-21T22:21Z
STARTUP_MSG:   java = 1.8.0_51
[...]
************************************************************/
15/08/16 19:09:16 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/08/16 19:09:16 INFO namenode.NameNode: createNameNode [-format]
15/08/16 19:09:16 INFO namenode.AclConfigFlag: ACLs enabled? false
15/08/16 19:09:16 INFO namenode.FSImage: Allocated new BlockPoolId: BP-393991083-127.0.0.1-1439744956758
15/08/16 19:09:16 INFO common.Storage: Storage directory /var/lib/hadoop-hdfs/hdfs/dfs/namenode has been successfully formatted.
15/08/16 19:09:16 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/08/16 19:09:16 INFO util.ExitUtil: Exiting with status 0
15/08/16 19:09:16 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at myhost.mydomain/127.0.0.1
************************************************************/

Start the Hadoop services:

# systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs

Check that the Hadoop services have been started:

# systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs

Enable the Hadoop services permanently, in case everything went smoothly:

# systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs

Create the default HDFS directories:

# hdfs-create-dirs

Web UI:

Setting Up a User's Sandbox (as root)

In the following commands, build is the Unix user name:

# runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -mkdir /user/build"
# runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -chown build /user/build"

Running WordCount (as user)

For simplicity, a WordCount example is available on GitHub that you can copy:

$ git clone https://github.com/timothysc/hadoop-tests.git

Once it has downloaded you can put the example .txt file into your user location

$ cd WordCount
$ hadoop fs -put constitution.txt /user/build

Now you can build WordCount against the system installed .jars.

$ mvn-rpmbuild package 

Finally you can run:

$ hadoop jar wordcount.jar org.myorg.WordCount /user/build /user/build/output

Feel free to cat the part-0000 file to see the results.

References

Denis Arnaud's page >