Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h2>Synchronization overhead</h2> <p>I would guess that <strong>the amount of work done per iteration of the loop is too small</strong>. Had you split the image into four parts and ran the computation in parallel, you would have noticed a large gain. <strong>Try to design the loop in a way that would case less iterations and more work per iteration</strong>. The reasoning behind this is that there is too much synchronization done.</p> <h2>Cache usage</h2> <p>An important factor may be how the data is split (partitioned) for the processing. <strong>If the proceessed rows are separated</strong> as in the bad case below, then <strong>more rows will cause a cache miss</strong>. This effect will become more important with each additional thread, because the distance between rows will be greater. If you are certain that the parallelizing function performs reasonable partitioning, then manual work-splitting will not give any results</p> <pre><code> bad good ****** t1 ****** t1 ****** t2 ****** t1 ****** t1 ****** t1 ****** t2 ****** t1 ****** t1 ****** t2 ****** t2 ****** t2 ****** t1 ****** t2 ****** t2 ****** t2 </code></pre> <p>Also make sure that you <strong>access your data in the same way it is aligned</strong>; it is possible that each call to <code>offset[]</code> and <code>BayerChannel[]</code> is a cache miss. Your algorithm is very memory intensive. Almost all operations are either accessing a memory segment or writing to it. Preventing cache misses and minimizing memory access is crucial.</p> <h2>Code optimizations</h2> <p>the optimizations shown below may be done by the compiler and may not give better results. It is worth knowing that they can be done.</p> <pre><code> // is the memset really necessary? //memset(RgbChannel, 0, Width * Height * 3 * sizeof(T)); parallel_for(0, Height, [&amp;] (int row) { int rowMod = (row &amp; 1) &lt;&lt; 1; for (auto col = 0, bayerIndex = row * Width, tripleBayerIndex=row*Width*3; col &lt; Width; col+=2, bayerIndex+=2, tripleBayerIndex+=6) { auto rgbIndex = tripleBayerIndex + offsets[rowMod]; RgbChannel[rgbIndex] = BayerChannel[bayerIndex]; //unrolled the loop to save col &amp; 1 operation rgbIndex = tripleBayerIndex + 3 + offsets[rowMod+1]; RgbChannel[rgbIndex] = BayerChannel[bayerIndex+1]; } }); </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload