StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POhadoop: having more than one reducers under pseudo distributed environment?
text
Body
copied!<p>I am newbie to hadoop. I have successfully configured a hadoop setup in pseudo distributed mode. I want to have multiple reducers with the option <code>-D mapred.reduce.tasks=2</code> (with hadoop-streaming). however there's still only one reducer.</p> <p>according to Google I'm sure that mapred.LocalJobRunner limits number of reducers to 1. But I wonder is there any workaround to have more reducers?</p> <p><strong>my hadoop configuration files:</strong></p> <pre><code>[admin@localhost string-count-hadoop]$ cat ~/hadoop-1.1.2/conf/core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/admin/hadoop-data/tmp</value> </property> </configuration> [admin@localhost string-count-hadoop]$ cat ~/hadoop-1.1.2/conf/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration> [admin@localhost string-count-hadoop]$ cat ~/hadoop-1.1.2/conf/hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  <configuration> <property> <name>dfs.name.dir</name> <value>/home/admin/hadoop-data/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/admin/hadoop-data/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> </code></pre> <p><strong>the way I start job:</strong></p> <pre><code>[admin@localhost string-count-hadoop]$ cat hadoop-startjob.sh #!/bin/sh ~/hadoop-1.1.2/bin/hadoop jar ~/hadoop-1.1.2/contrib/streaming/hadoop-streaming-1.1.2.jar \ -D mapred.job.name=string-count \ -D mapred.reduce.tasks=2 \ -mapper mapper \ -file mapper \ -reducer reducer \ -file reducer \ -input $1 \ -output $2 [admin@localhost string-count-hadoop]$ ./hadoop-startjob.sh /z/programming/testdata/items_sequence /z/output packageJobJar: [mapper, reducer] [] /tmp/streamjob837249979139287589.jar tmpDir=null 13/07/17 20:21:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/07/17 20:21:10 WARN snappy.LoadSnappy: Snappy native library not loaded 13/07/17 20:21:10 INFO mapred.FileInputFormat: Total input paths to process : 1 13/07/17 20:21:11 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir. ... ... </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload