Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy job with mappers only is so slow in real cluster?
    text
    copied!<p>I have a job with mapper PrepareData only which needed for converting text data to <em>SequencialFile</em> with <em>VLongWritable</em> as a <strong>key</strong> and <em>DoubleArrayWritable</em> as a <strong>value</strong>.</p> <p>When I run it over 455000x90 (~384 Mb) data with lines, for example:</p> <blockquote> <p>13.124,123.12,12.12,... 1.12</p> <p>23.12,1.5,12.6,... 6.123</p> <p>...</p> </blockquote> <p>in <strong>local</strong> mode it's takes on average:</p> <ol> <li>51 seconds on Athlon 64 X2 Dual Core 5600+, 2.79Ггц;</li> <li>54 seconds on Athlon 64 Processor 3700+, 1Ггц;</li> </ol> <p>=> 52-53 seconds on average.</p> <p>but when I run it in real cluster with this 2 machines (Athlon 64 X2 Dual Core 5600+, 3700+) it's takes 81 seconds in best case.</p> <p>Job executed with 4 mapper (block size ~96 mb) and 2 reducers.</p> <p>Cluster powered by <strong>Hadoop 0.21.0</strong>, configured for jvm reuse.</p> <p><strong>Mapper</strong>:</p> <pre class="lang-java prettyprint-override"><code>public class PrepareDataMapper extends Mapper&lt;LongWritable, Text, VLongWritable, DoubleArrayWritable&gt; { private int size; // hint private DoubleWritable[] doubleArray; private DoubleArrayWritable mapperOutArray = new DoubleArrayWritable(); private VLongWritable mapOutKey = new VLongWritable(); @Override protected void setup(Context context) throws IOException { Configuration conf = context.getConfiguration(); size = conf.getInt("dataDimSize", 0); doubleArray = new DoubleWritable[size]; for (int i = 0; i &lt; size; i++) { doubleArray[i] = new DoubleWritable(); } } @Override public void map( LongWritable key, Text row, Context context) throws IOException, InterruptedException { String[] fields = row.toString().split(","); for (int i = 0; i &lt; size; i++) { doubleArray[i].set(Double.valueOf(fields[i])); } mapperOutArray.set(doubleArray); mapOutKey.set(key.get()); context.write(mapOutKey, mapperOutArray); } } </code></pre> <p><strong>DoubleArrayWritable</strong>:</p> <pre class="lang-java prettyprint-override"><code>public class DoubleArrayWritable extends ArrayWritable { public DoubleArrayWritable() { super(DoubleWritable.class); } public DoubleArrayWritable(DoubleWritable[] values) { super(DoubleWritable.class, values); } public void set(DoubleWritable[] values) { super.set(values); } public DoubleWritable get(int idx) { return (DoubleWritable) get()[idx]; } public double[] getVector(int from, int to) { int sz = to - from + 1; double[] vector = new double[sz]; for (int i = from; i &lt;= to; i++) { vector[i-from] = get(i).get(); } return vector; } } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload