You are here:Home » Tools » Setup Hadoop 2.2.0 Yarn on single node cluster (Only for the Mac User)

Setup Hadoop 2.2.0 Yarn on single node cluster (Only for the Mac User)


Reference: Hadoop 2.x Yarn on single node cluster
  • Download: Stable Version hadoop-2.2.0 -> hadoop-2.2.0.tar.gz  
  • $ vim ~/.bash_profile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Hadoop-2.2.0
export HADOOP_HOME=~/workspace/Hadoop/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 27 #export JAVA_HOME=${JAVA_HOME}
 28 export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home
...
 49 #export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
 50 export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    • $ vim yarn-env.sh 
- Add `YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"` after line 106  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
106 YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
107 YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
108 if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  • $ cd $HADOOP_HOME/etc/hadoop 
    • $ mkdir -p $HADOOP_HOME/mydata/tmp
    • $ mkdir -p $HADOOP_HOME/mydata/mapred/temp
    • mkdir -p $HADOOP_HOME/mydata/mapred/local
    • mkdir -p $HADOOP_HOME/mydata/hdfs/namenode
    • mkdir -p $HADOOP_HOME/mydata/hdfs/datanode
    • vim core-site.xml 
Before  
-------------------
 19 <configuration>
 20     <property>
 21         <name>hadoop.tmp.dir</name>
 22         <value>$HADOOP_HOME/hadooptmp/hadoop-${user.name} </value>
 23         <description>A base for other temporarydirectories.</description>
 24     </property>
 25 
 26     <property>
 27         <name>fs.default.name</name>
 28         <value>hdfs://localhost:8010</value>
 29         <description>The name of the default file system.  A URI whose
 30             scheme and authority determine the FileSystem implementation.  The
 31             uri's scheme determines the config property (fs.SCHEME.impl) naming
 32             the FileSystem implementation class. The uri's authority is used to
 33             determine the host, port, etc. for a filesystem.</description>
 34     </property>
 35 </configuration> 
-------------------
Now (Use the following version instead of the version above)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<configuration>
  <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:8020</value>
     <final>true</final>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>$HADOOP_HOME/mydata/tmp</value>
  </property>
</configuration>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    • vim yarn-site.xml
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
  </property>
  <property>
     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      • Use mapreduce_shuffle instead of mapreduce.shuffle in line 20  
      • Shuffle server actually is Jetty/Netty Server  
      • If there are multiple nodes in your YARN cluster, setting up yarn.resourcemanager.address is needed   
    • $ mv mapred-site.xml.template mapred-site.xml
    • $ vim mapred-site.xml
Before (We do not need to set `mapred.job.tracker` as follows in Hadoop 2.2.0)   
-------------------
 19 <configuration>
 20     <property>
 21         <name>mapred.job.tracker</name>
 22         <value>localhost:54311</value>
 23         <description>The host and port that the MapReduce job tracker runs
 24             at.  If "local", thenjobs are run in-process as a single map
 25             and reduce task.
 26         </description>
 27     </property>
 28 
 29     <property>
 30         <name>mapred.map.tasks</name>
 31         <value>10</value>
 32         <description>As a rule of thumb, use 10x the number of slaves(i.e., numbe
 33         </description>
 34     </property>
 35 
 36     <property>
 37         <name>mapred.reduce.tasks</name>
 38         <value>2</value>
 39         <description>As a rule of thumb, use 2x the number of slaveprocessors (i.
 40         </description>
 41     </property>
 42 </configuration>
-------------------
Now (Use the following version instead of the version above)   
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<configuration>
  <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.cluster.temp.dir</name>
    <value>$HADOOP_HOME/mydata/mapred/temp</value>
    <description>The temp dir for map reduce</description>
    <final>true</final>
  </property>
  <property>
    <name>mapreduce.cluster.local.dir</name>
    <value>$HADOOP_HOME/mydata/mapred/local</value>
    <description>The local dir for map reduce</description>
    <final>true</final>
  </property>
</configuration>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    •  $ vim hdfs-site.xml 
Before  
-------------------
 19 <configuration>
 20     <property>
 21         <name>dfs.replication</name>
 22         <value>1</value>
 23         <description>Default block replication.
 24             The actual number of replications can be specified when the file
 25             The default is used if replication is not specified in create ti
 26         </description>
 27     </property>
 28 </configuration>
-------------------
Now (Use the following version instead of the version above)  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<configuration>
  <property>
     <name>dfs.replication</name>
     <value>1</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:$HADOOP_HOME/mydata/hdfs/namenode</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:$HADOOP_HOME/mydata/hdfs/datanode</value>
   </property>
   <property>
     <name>dfs.permissions</name>
     <value>false</value>
   </property>
</configuration>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    • $ vim slaves
      • Each line saves the ip of each NodeManager  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1 localhost
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Start and Stop HDFS and YARN
  • $ cd $HADOOP_HOME
    • (If hadoop can not start datanode, then run $ rm -rf mydata )    
  • $ `bin/hdfs namenode -format`  
  • Start hdfs -> `sbin/start-dfs.sh`
    • $ `sbin/hadoop-daemon.sh start namenode`  
    • $ `sbin/hadoop-daemon.sh start datanode`   
    • $ `sbin/hadoop-daemon.sh start secondarynamenode` (Optional?) 
    •  Note -> If you have more than one datanode, `hadoop-daemons.sh` is need; or you can use `sbin/start-dfs.sh`.    
    • namenode should be started before datanode  
  • Start YARN  -> `sbin/start-yarn.sh`
    • $ `sbin/yarn-daemon.sh start resourcemanager`  
    • $ `sbin/yarn-daemon.sh start nodemanager`  
    • Note -> If you have more than one datanode, `yarn-daemons.sh` is need; or you can use `sbin/start-yarn.sh`.    
  • $ `sbin/mr-jobhistory-daemon.sh start historyserver` (Optional)    
  • $ `jps`  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15406 NameNode
15583 SecondaryNameNode
15487 DataNode
15692 ResourceManager
15776 NodeManager
16833 JobHistoryServer 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  • Then check localhost:8088 
  •  To stop 
    • run $ `./stop-yarn.sh` and $ `./stop-dfs.sh`  
    • Or $ `sbin/stop-all.sh`  
    • Or 
      • $ `sbin/hadoop-daemon.sh stop namenode` 
      • $ `sbin/hadoop-daemon.sh stop datanode` 
      • $ `sbin/hadoop-daemon.sh stop secondarynamenode` 
      • $ `sbin/yarn-daemon.sh stop resourcemanager` 
      • $ `sbin/yarn-daemon.sh stop nodemanager`
      • $ `sbin/mr-jobhistory-daemon.sh stop historyserver`
  • Will receive `no proxyserver to stop` (Solve it later)      

Run Example  
  • cd $HADOOP_HOME   
    • $ cd mydata  
    • $ wget http://www.gutenberg.org/cache/epub/20417/pg20417.txt
    • $ hdfs dfs -mkdir /input
    • $ hdfs dfs -copyFromLocal pg20417.txt /input
      • or $ hdfs dfs -put pg20417.txt /input 
  • cd $HADOOP_HOME   
    •  $ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output
    • ($ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+')    
  • - Read output  
    • $ hdfs dfs -cat /output/part-r-00000 
    •  or copy from hdfs to local $ `hdfs dfs -get /output ../hadoopLocalData/output 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  • Don't forget there is a folder located in the hadoop home -> $HADOOP_HOME/logs !!!      


0 comments:

Post a Comment