Note that there are some explanatory texts on larger screens.

plurals
  1. POParallelizing a for loop gives no performance gain
    primarykey
    data
    text
    <p>I have an algorithm which converts a bayer image channel to RGB. In my implementation I have a single nested <code>for</code> loop which iterates over the bayer channel, calculates the rgb index from the bayer index and then sets that pixel's value from the bayer channel. The main thing to notice here is that each pixel can be calculated independently from other pixels (doesn't rely on previous calculations) and so the algorithm is a natural candidate for paralleization. The calculation does however rely on some preset arrays which all threads will be accessing in the same time but will not change.</p> <p>However, when I tried parallelizing the main <code>for</code>with MS's <code>cuncurrency::parallel_for</code> I gained no boost in performance. In fact, for an input of size 3264X2540 running over a 4-core CPU, the non parallelized version ran in ~34ms and the parallelized version ran in ~69ms (averaged over 10 runs). I confirmed that the operation was indeed parallelized (3 new threads were created for the task).</p> <p>Using Intel's compiler with <code>tbb::parallel_for</code> gave near exact results. For comparison, I started out with this algorithm implemented in <code>C#</code> in which I also used <code>parallel_for</code> loops and there I encountered near X4 performance gains (I opted for <code>C++</code> because for this particular task <code>C++</code> was faster even with a single core).</p> <p>Any ideas what is preventing my code from parallelizing well?</p> <p>My code:</p> <pre><code>template&lt;typename T&gt; void static ConvertBayerToRgbImageAsIs(T* BayerChannel, T* RgbChannel, int Width, int Height, ColorSpace ColorSpace) { //Translates index offset in Bayer image to channel offset in RGB image int offsets[4]; //calculate offsets according to color space switch (ColorSpace) { case ColorSpace::BGGR: offsets[0] = 2; offsets[1] = 1; offsets[2] = 1; offsets[3] = 0; break; ...other color spaces } memset(RgbChannel, 0, Width * Height * 3 * sizeof(T)); parallel_for(0, Height, [&amp;] (int row) { for (auto col = 0, bayerIndex = row * Width; col &lt; Width; col++, bayerIndex++) { auto offset = (row%2)*2 + (col%2); //0...3 auto rgbIndex = bayerIndex * 3 + offsets[offset]; RgbChannel[rgbIndex] = BayerChannel[bayerIndex]; } }); } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload