Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Assuming you are using <strong>Hadoop Streaming</strong>, you need to use the <strong>KeyFieldBasedComparator</strong> class.</p> <ol> <li><p>-D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator should be added to streaming command</p></li> <li><p>You need to provide type of sorting required using mapred.text.key.comparator.options. Some useful ones are -n : numeric sort, -r : reverse sort</p></li> </ol> <p><strong>EXAMPLE</strong> : </p> <p>Create an identity mapper and reducer with the following code</p> <p>This is the <strong>mapper.py</strong> &amp; <strong>reducer.py</strong> </p> <pre><code>#!/usr/bin/env python import sys for line in sys.stdin: print "%s" % (line.strip()) </code></pre> <p>This is the <strong>input.txt</strong></p> <pre><code>1 11 2 20 7 3 40 </code></pre> <p>This is the <strong>Streaming</strong> command</p> <pre><code>$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar -D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator -D mapred.text.key.comparator.options=-n -input /user/input.txt -output /user/output.txt -file ~/mapper.py -mapper ~/mapper.py -file ~/reducer.py -reducer ~/reducer.py </code></pre> <p>And you will get the required output </p> <pre><code>1 2 3 7 11 20 40 </code></pre> <p><strong>NOTE</strong> :</p> <ol> <li><p>I have used a simple one key input. If however you have multiple keys and/or partitions, you will have to edit mapred.text.key.comparator.options as needed. Since I do not know your use case , my example is limited to this</p></li> <li><p>Identity mapper is needed since you will need atleast one mapper for a MR job to run.</p></li> <li><p>Identity reducer is needed since shuffle/sort phase will not work if it is a pure map only job.</p></li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload