- 
                Notifications
    You must be signed in to change notification settings 
- Fork 356
Pseudo Distributed Hadoop Setup
Make sure you can ssh to the system
Create ssh key and add it to authorized keys
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 622 ~/.ssh/authorized_keys
Try connecting to local host with ssh (should not be prompted for password)
ssh localhost
Navigate to http://hadoop.apache.org/releases.html
Click Download link, then click Download release now! link
Pick a download mirror
Click hadoop-1.0.4/ directory link
Download "hadoop-1.0.4-bin.tar.gz"
I've unpacked it in a directory named '~/Hadoop'
cd ~/Hadoop
tar xvzf ~/Downloads/hadoop-1.0.4-bin.tar.gz
Create a hadoop-1.0.4-env file with the following content (modify the JAVA_HOME and HADOOP_PREFIX to match your system):
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.19.x86_64
export HADOOP_PREFIX="/home/trisberg/Hadoop/hadoop-1.0.4"
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_CONF_DIR=$HADOOP_PREFIX/conf
export HADOOP_LIBEXEC_DIR=$HADOOP_PREFIX/libexec
export PATH=$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$PATH
core-site.xml should have this content
(you can modify the 'hadoop.tmp.dir' directory to your liking):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/${user.name}/Hadoop/hadoop-1.0.4-store</value>
        <description>A base for other temporary directories.</description>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:8020</value>
    </property>
</configuration>
hdfs-site.xml should have this content:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.support.broken.append</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>
mapred-site.xml should have this content:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:8021</value>
    </property>
</configuration>
You need to add the JAVA_HOME setting that we also set in step 5 above (again adjust this to match your system):
...
# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.19.x86_64
...
Start by sourcing the environment settings
source hadoop-1.0.4-env
Format the hadoop file system
You only do this step once for a new cluster!
hadoop namenode -format
Start Hadoop namenode, datanode and secondary-namenode
start-dfs.sh
Check that you have the dfs daemons running
jps
You should see something like:
[trisberg@localhost ~]$ jps
27932 SecondaryNameNode
27827 DataNode
26384 NameNode
27988 Jps
Start Hadoop job-tracker and task-tracker
start-mapred.sh
Check that you have the dfs and mapred daemons running
 jps
You should see something like:
[trisberg@localhost ~]$ jps
28170 TaskTracker
27932 SecondaryNameNode
28053 JobTracker
27827 DataNode
26384 NameNode
28259 Jps
Once the cluster is up and running you can access the web interfaces on these adresses:
- NameNode: http://localhost:50070/dfshealth.jsp
- JobTracker: http://localhost:50030/jobtracker.jsp
This would be a good time to run the tests for the spring-hadoop project.
- source is available here: https://github.com/SpringSource/spring-hadoop
- run tests using the command:
./gradlew -Phd.fs=hdfs://localhost:8020 -Phd.jt=localhost:8021 clean build
When you are done testing you can use these commands to shut the cluster down:
stop-mapred.sh
stop-dfs.sh