Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The following tests have been done with Visual C++ compiler as it is used by the default Qt Creator install (I guess with no optimization flag). When using GCC, there is no big difference between Mystical's version and my "optimized" code. So the conclusion is that compiler optimizations take care off micro optimization better than humans (me at last). I leave the rest of my answer for reference.</p> <hr> <p>It's not efficient to process images this way. It's better to use single dimension arrays. Processing all pixels is the done in one loop. Random access to points could be done using:</p> <pre><code>pointer + (x + y*width)*(sizeOfOnePixel) </code></pre> <p>In this particular case, it's better to compute and cache the sum of three pixels groups horizontally because they are used three times each.</p> <p>I've done some tests and I think it's worth sharing. Each result is an average of five tests.</p> <p>Original code by user1615209:</p> <pre><code>8193: 4392 ms 8192: 9570 ms </code></pre> <p>Mystical's version:</p> <pre><code>8193: 2393 ms 8192: 2190 ms </code></pre> <p>Two pass using a 1D array: first pass for horizontal sums, second for vertical sum and average. Two pass addressing with three pointers and only increments like this:</p> <pre><code>imgPointer1 = &amp;avg1[0][0]; imgPointer2 = &amp;avg1[0][SIZE]; imgPointer3 = &amp;avg1[0][SIZE+SIZE]; for(i=SIZE;i&lt;totalSize-SIZE;i++){ resPointer[i]=(*(imgPointer1++)+*(imgPointer2++)+*(imgPointer3++))/9; } 8193: 938 ms 8192: 974 ms </code></pre> <p>Two pass using a 1D array and addressing like this:</p> <pre><code>for(i=SIZE;i&lt;totalSize-SIZE;i++){ resPointer[i]=(hsumPointer[i-SIZE]+hsumPointer[i]+hsumPointer[i+SIZE])/9; } 8193: 932 ms 8192: 925 ms </code></pre> <p>One pass caching horizontal sums just one row ahead so they stay in cache:</p> <pre><code>// Horizontal sums for the first two lines for(i=1;i&lt;SIZE*2;i++){ hsumPointer[i]=imgPointer[i-1]+imgPointer[i]+imgPointer[i+1]; } // Rest of the computation for(;i&lt;totalSize;i++){ // Compute horizontal sum for next line hsumPointer[i]=imgPointer[i-1]+imgPointer[i]+imgPointer[i+1]; // Final result resPointer[i-SIZE]=(hsumPointer[i-SIZE-SIZE]+hsumPointer[i-SIZE]+hsumPointer[i])/9; } 8193: 599 ms 8192: 652 ms </code></pre> <p>Conclusion:</p> <ul> <li>No benefits of using several pointers and just increments (I thought it would have been faster)</li> <li>Caching horizontal sums is better than computing them several time.</li> <li>Two pass is not three times faster, two times only.</li> <li>It's possible to achieve 3.6 times faster using both a single pass and caching an intermediary result</li> </ul> <p>I'm sure it's possible to do much better.</p> <p><strong>NOTE</strong> Please, note that I wrote this answer to target general performance issues rather than the cache problem explained in Mystical's excellent answer. At the beginning it was just pseudo code. I was asked to do tests in the comments... Here is a completely refactored version with tests.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload