Note that there are some explanatory texts on larger screens.

plurals
  1. POCalculate Means and Standard Deviation by columns in Hadoop
    primarykey
    data
    text
    <p>I want to calculate means and standard deviation by columns in Hadoop.</p> <p>I simple adopt single pass Naïve algorithm to MapReduce. I tested it on multivariate data sets 455000x90 and 650000x120 and got speedup lower, more lower, then count of processors. For standalone and pseudo-distributed mode with 2 active cores I got speedup 0,4 = 20seconds / 53seconds for 455000x90.</p> <p>Why my programm is not effective ? Is it possible to improve it ?</p> <p>Mapper:</p> <pre class="lang-java prettyprint-override"><code>public class CalculateMeanAndSTDEVMapper extends Mapper &lt;LongWritable, DoubleArrayWritable, IntWritable, DoubleArrayWritable&gt; { private int dataDimFrom; private int dataDimTo; private long samplesCount; private int universeSize; @Override protected void setup(Context context) throws IOException { Configuration conf = context.getConfiguration(); dataDimFrom = conf.getInt("dataDimFrom", 0); dataDimTo = conf.getInt("dataDimTo", 0); samplesCount = conf.getLong("samplesCount", 0); universeSize = dataDimTo - dataDimFrom + 1; } @Override public void map( LongWritable key, DoubleArrayWritable array, Context context) throws IOException, InterruptedException { DoubleWritable[] outArray = new DoubleWritable[universeSize*2]; for (int c = 0; c &lt; universeSize; c++) { outArray[c] = new DoubleWritable( array.get(c+dataDimFrom).get() / samplesCount); } for (int c = universeSize; c &lt; universeSize*2; c++) { double val = array.get(c-universeSize+dataDimFrom).get(); outArray[c] = new DoubleWritable((val*val) / samplesCount); } context.write(new IntWritable(1), new DoubleArrayWritable(outArray)); } } </code></pre> <p>Combiner:</p> <pre class="lang-java prettyprint-override"><code>public class CalculateMeanAndSTDEVCombiner extends Reducer &lt;IntWritable, DoubleArrayWritable, IntWritable, DoubleArrayWritable&gt; { private int dataDimFrom; private int dataDimTo; private int universeSize; @Override protected void setup(Context context) throws IOException { Configuration conf = context.getConfiguration(); dataDimFrom = conf.getInt("dataDimFrom", 0); dataDimTo = conf.getInt("dataDimTo", 0); universeSize = dataDimTo - dataDimFrom + 1; } @Override public void reduce( IntWritable column, Iterable&lt;DoubleArrayWritable&gt; partialSums, Context context) throws IOException, InterruptedException { DoubleWritable[] outArray = new DoubleWritable[universeSize*2]; boolean isFirst = true; for (DoubleArrayWritable partialSum : partialSums) { for (int i = 0; i &lt; universeSize*2; i++) { if (!isFirst) { outArray[i].set(outArray[i].get() + partialSum.get(i).get()); } else { outArray[i] = new DoubleWritable(partialSum.get(i).get()); } } isFirst = false; } context.write(column, new DoubleArrayWritable(outArray)); } } </code></pre> <p>Reducer:</p> <pre class="lang-java prettyprint-override"><code>public class CalculateMeanAndSTDEVReducer extends Reducer &lt;IntWritable, DoubleArrayWritable, IntWritable, DoubleArrayWritable&gt; { private int dataDimFrom; private int dataDimTo; private int universeSize; @Override protected void setup(Context context) throws IOException { Configuration conf = context.getConfiguration(); dataDimFrom = conf.getInt("dataDimFrom", 0); dataDimTo = conf.getInt("dataDimTo", 0); universeSize = dataDimTo - dataDimFrom + 1; } @Override public void reduce( IntWritable column, Iterable&lt;DoubleArrayWritable&gt; partialSums, Context context) throws IOException, InterruptedException { DoubleWritable[] outArray = new DoubleWritable[universeSize*2]; boolean isFirst = true; for (DoubleArrayWritable partialSum : partialSums) { for (int i = 0; i &lt; universeSize; i++) { if (!isFirst) { outArray[i].set(outArray[i].get() + partialSum.get(i).get()); } else { outArray[i] = new DoubleWritable(partialSum.get(i).get()); } } isFirst = false; } for (int i = universeSize; i &lt; universeSize * 2; i++) { double mean = outArray[i-universeSize].get(); outArray[i].set(Math.sqrt(outArray[i].get() - mean*mean)); } context.write(column, new DoubleArrayWritable(outArray)); } } </code></pre> <p>Where DoubleArrayWritable is simple class which extends ArrayWritable:</p> <pre class="lang-java prettyprint-override"><code>public class DoubleArrayWritable extends ArrayWritable { public DoubleArrayWritable() { super(DoubleWritable.class); } public DoubleArrayWritable(DoubleWritable[] values) { super(DoubleWritable.class, values); } public DoubleWritable get(int idx) { return (DoubleWritable) get()[idx]; } } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload