Note that there are some explanatory texts on larger screens.

plurals
  1. POhadoop inverted-index without recurrence of file names
    primarykey
    data
    text
    <p>what i have in output is:</p> <p>word , file ----- ------ wordx Doc2, Doc1, Doc1, Doc1, Doc1, Doc1, Doc1, Doc1</p> <p>what i want is:</p> <p>word , file ----- ------ wordx Doc2, Doc1</p> <pre><code>public static class LineIndexMapper extends MapReduceBase implements Mapper&lt;LongWritable, Text, Text, Text&gt; { private final static Text word = new Text(); private final static Text location = new Text(); public void map(LongWritable key, Text val, OutputCollector&lt;Text, Text&gt; output, Reporter reporter) throws IOException { FileSplit fileSplit = (FileSplit) reporter.getInputSplit(); String fileName = fileSplit.getPath().getName(); location.set(fileName); String line = val.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, location); } } } public static class LineIndexReducer extends MapReduceBase implements Reducer&lt;Text, Text, Text, Text&gt; { public void reduce(Text key, Iterator&lt;Text&gt; values, OutputCollector&lt;Text, Text&gt; output, Reporter reporter) throws IOException { boolean first = true; StringBuilder toReturn = new StringBuilder(); while (values.hasNext()) { if (!first) { toReturn.append(", "); } first = false; toReturn.append(values.next().toString()); } output.collect(key, new Text(toReturn.toString())); } } </code></pre> <p>for the best performance - where should i skip the recurring file name? map,reduce or both? ps: i am a beginner in writing MR tasks and also trying to figure out programming logic with my question.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload