From Fedora Project Wiki
Line 59: Line 59:
== Running WordCount (as user) ==
== Running WordCount (as user) ==
For simplicity, a WordCount example is available on GitHub that you can copy:
For simplicity, a WordCount example is available on GitHub that you can copy:
  $ git clone https://github.com/timothysc/hadoop-tests.github
  $ git clone https://github.com/timothysc/hadoop-tests.git
Once it has downloaded you can put the example .txt file into your user location
Once it has downloaded you can put the example .txt file into your user location
  $ cd hadoop-tests/WordCount
  $ cd hadoop-tests/WordCount

Revision as of 17:55, 16 August 2015

Denis Arnaud's page >

Overview

Bootstrapping Hadoop on Fedora for Fedora 22+.

See Also

Installation and Setup (as root)

Install Hadoop:

# dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn*

Set the JAVA_HOME environment variable within the Hadoop configuration file (the default does not seem to work)

# vi /etc/hadoop/hadoop-env.sh

For instance, with Oracle Java JDK 8, the line should read something like:

export JAVA_HOME=/usr/java/jdk1.8.0_51

Format the name-node:

# runuser hdfs -s /bin/bash /bin/bash -c "hadoop namenode -format"

which should produce something like:

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = myhost.mydomain/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.4.1
STARTUP_MSG:   classpath = /etc/hadoop:/usr/share/hadoop/common/lib/asm-tree-5.0.3.jar:[...]
STARTUP_MSG:   build = Unknown -r Unknown; compiled by 'mockbuild' on 2015-04-21T22:21Z
STARTUP_MSG:   java = 1.8.0_51
[...]
************************************************************/
15/08/16 19:09:16 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/08/16 19:09:16 INFO namenode.NameNode: createNameNode [-format]
15/08/16 19:09:16 INFO namenode.AclConfigFlag: ACLs enabled? false
15/08/16 19:09:16 INFO namenode.FSImage: Allocated new BlockPoolId: BP-393991083-127.0.0.1-1439744956758
15/08/16 19:09:16 INFO common.Storage: Storage directory /var/lib/hadoop-hdfs/hdfs/dfs/namenode has been successfully formatted.
15/08/16 19:09:16 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/08/16 19:09:16 INFO util.ExitUtil: Exiting with status 0
15/08/16 19:09:16 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at myhost.mydomain/127.0.0.1
************************************************************/

Start the Hadoop services:

# systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager

Check that the Hadoop services have been started:

# systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager

Enable the Hadoop services permanently, in case everything went smoothly:

# systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager

Create the default HDFS directories:

# hdfs-create-dirs

Setting Up a User's Sandbox (as root)

In the following commands, build is the Unix user name:

# runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -mkdir /user/build"
# runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -chown build /user/build"

Running WordCount (as user)

For simplicity, a WordCount example is available on GitHub that you can copy:

$ git clone https://github.com/timothysc/hadoop-tests.git

Once it has downloaded you can put the example .txt file into your user location

$ cd hadoop-tests/WordCount
$ hadoop fs -put constitution.txt /user/build

Now you can build WordCount against the system installed .jars.

$ mvn-rpmbuild package 

Finally you can run:

$ hadoop jar wordcount.jar org.myorg.WordCount /user/build /user/build/output

Feel free to cat the part-0000 file to see the results.

References

Denis Arnaud's page >