Skip to content

Running Wordcount example on Hadoop cluster

goforaditya/WordCountHadoop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WordCount Example Hadoop 3.0.3

Installing Hadoop

To install Hadoop and set System Variable follow: https://github.com/AdityaSinghRathore/HadoopLinuxMint/blob/master/main.pdf

Wordcount example on Hadoop 3.0.3 using Python code over Mapreduce Streaming

  1. Firstly we need to start our hadoop cluster.
    $ start-all.sh

Or,
$ start-dfs.sh Then do $ start-yarn.sh

  1. Code the mapper in python (code files above).
    $ touch mapper.py
    $ vim mapper.py


  1. Writing a sample text file.
    $ touch sample.txt
    $ vim sample.txt

  1. Testing the Mapper code on above sample file.
    $ cat sample.txt | python mapper.py

  1. Creating and coding the reducer.
    $ touch reducer.py
    $ vim reducer.py

  1. SSH to localhost and create /user/wce/input directory in HDFS
    $ ssh localhost
    $ hadoop fs -mkdir /user/wce/input

  1. Copy sample.txt file to the HDFS /user/wce/input/ directory.
    $ hadoop fs -put sample.txt /user/wce/input

  1. Run the Map/Reduce job using command.
    $ mapred streaming -file mapper.py -mapper mapper.py -file reducer.py -reducer reducer.py -input /user/wce/input/sample.txt -output /user/wce/output

  2. The job is now running.

  3. View the results.
    hadoop fs -ls /user/wce/output
    hadoop fs -cat /user/wce/output/*

About

Running Wordcount example on Hadoop cluster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages