Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This is most probably because your combiner is running in both map and reduce phases (a little known 'feature').</p> <p>Basically you are amending the key in the combiner, which may or may not run as map outputs are merged together in the reducer. After the combiner is run (reduce side), the keys are fed through the grouping comparator to determine what values back the Iterable passed to the reduce method (i'm skirting around the streaming aspect of the reduce phase here - the iterable is not backed by a set or list of values, more calls to iterator().next() return true if the grouping comparator detemines the current key and the last key are the same)</p> <p>You can try and detect the current combiner phase side (map or reduce) by inspecting the Context (there is a <code>Context.getTaskAttempt().isMap()</code> method, but i have some memory of this being problematic too, and there even might be a JIRA ticket about this somewhere).</p> <p>Bottom line, don't amend the key in the combiner unless you can find away to bypass this bevaviour <em>if</em> the combiner is running reduce side.</p> <p><strong>EDIT</strong> So investigating @Amar's comment, i put together some code (<a href="http://pastebin.com/PH5TsN6x" rel="nofollow">pastebin link</a>) which adds in some verbose comparators, combiners, reducers etc. If you run a single map job then in the reduce phase no combiner will run, and map output will not be sorted again as it is already assumed to be sorted.</p> <p>It is assumed to be sorted as it is sorted prior to being sent into the combiner class, and it assumed that the keys will come out untouched - hence still sorted. Remember a Combiner is meant to Combine values for a given key.</p> <p>So with a single map and the given combiner, the reducer sees the keys in KeyOne, KeyTwo, KeyOne, KeyTwo, KeyOne order. The grouping comparator sees a transition between them and hence you get 6 calls to the reduce function</p> <p>If you use two mappers, then the reducer knows it has two sorted segments (one from each map), and so still needs to sort them prior to reducing - but because the number of segments is below a threshold, the sort is done as an inline stream sort (again the segments are assumed to be sorted). You still be the wrong output with two mappers (10 records output from the reduce phase).</p> <p>So again, don't amend the key in the combiner, this is not what the combiner is intended for.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload