Note that there are some explanatory texts on larger screens.

plurals
  1. POExtending Hadoop's TableInputFormat to scan with a prefix used for distribution of timestamp keys
    primarykey
    data
    text
    <p>I have an hbase table who's key is a timestamp with a one byte random prefix to distribute the keys so scans don't hotspot. I'm trying to extend <code>TableInputFormat</code> so that I can run a single MapReduce on the table with a range, prefixing all 256 possible prefixes so that all ranges with the specified timestamp range are scanned. My solution isn't working though, as it always seems to scan the last prefix (127) 256 times. Something must be shared across all scans.</p> <p>My code is below. Any ideas?</p> <pre><code>public class PrefixedTableInputFormat extends TableInputFormat { @Override public List&lt;InputSplit&gt; getSplits(JobContext context) throws IOException { List&lt;InputSplit&gt; splits = new ArrayList&lt;InputSplit&gt;(); Scan scan = getScan(); byte startRow[] = scan.getStartRow(), stopRow[] = scan.getStopRow(); byte prefixedStartRow[] = new byte[startRow.length+1]; byte prefixedStopRow[] = new byte[stopRow.length+1]; System.arraycopy(startRow, 0, prefixedStartRow, 1, startRow.length); System.arraycopy(stopRow, 0, prefixedStopRow, 1, stopRow.length); for (int prefix = -128; prefix &lt; 128; prefix++) { prefixedStartRow[0] = (byte) prefix; prefixedStopRow[0] = (byte) prefix; scan.setStartRow(prefixedStartRow); scan.setStopRow(prefixedStopRow); setScan(scan); splits.addAll(super.getSplits(context)); } return splits; } } </code></pre> <p>and</p> <pre><code> Configuration config = HBaseConfiguration.create(); Job job = new Job(config, "Aggregate"); job.setJarByClass(Aggregate.class); Scan scan = new Scan(); scan.setStartRow("20120630".getBytes()); scan.setStopRow("20120701".getBytes()); scan.setCaching(500); scan.setCacheBlocks(false); TableMapReduceUtil.initTableMapperJob( "event", scan, Mapper.class, ImmutableBytesWritable.class, ImmutableBytesWritable.class, job, true, PrefixedTableInputFormat.class); TableMapReduceUtil.initTableReducerJob("event", Reducer.class, job); </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload