Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The difference is caused by the same super-alignment issue from the following related questions:</p> <ul> <li><a href="https://stackoverflow.com/q/11413855/922184">Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?</a></li> <li><a href="https://stackoverflow.com/q/7905760/922184">Matrix multiplication: Small difference in matrix size, large difference in timings</a></li> </ul> <p>But that's only because there's one other problem with the code.</p> <p>Starting from the original loop:</p> <pre><code>for(i=1;i&lt;SIZE-1;i++) for(j=1;j&lt;SIZE-1;j++) { res[j][i]=0; for(k=-1;k&lt;2;k++) for(l=-1;l&lt;2;l++) res[j][i] += img[j+l][i+k]; res[j][i] /= 9; } </code></pre> <p>First notice that the two inner loops are trivial. They can be unrolled as follows:</p> <pre><code>for(i=1;i&lt;SIZE-1;i++) { for(j=1;j&lt;SIZE-1;j++) { res[j][i]=0; res[j][i] += img[j-1][i-1]; res[j][i] += img[j ][i-1]; res[j][i] += img[j+1][i-1]; res[j][i] += img[j-1][i ]; res[j][i] += img[j ][i ]; res[j][i] += img[j+1][i ]; res[j][i] += img[j-1][i+1]; res[j][i] += img[j ][i+1]; res[j][i] += img[j+1][i+1]; res[j][i] /= 9; } } </code></pre> <p>So that leaves the two outer-loops that we're interested in.</p> <p>Now we can see the problem is the same in this question: <a href="https://stackoverflow.com/q/9936132/922184">Why does the order of the loops affect performance when iterating over a 2D array?</a></p> <p>You are iterating the matrix column-wise instead of row-wise.</p> <hr> <p>To solve this problem, you should interchange the two loops.</p> <pre><code>for(j=1;j&lt;SIZE-1;j++) { for(i=1;i&lt;SIZE-1;i++) { res[j][i]=0; res[j][i] += img[j-1][i-1]; res[j][i] += img[j ][i-1]; res[j][i] += img[j+1][i-1]; res[j][i] += img[j-1][i ]; res[j][i] += img[j ][i ]; res[j][i] += img[j+1][i ]; res[j][i] += img[j-1][i+1]; res[j][i] += img[j ][i+1]; res[j][i] += img[j+1][i+1]; res[j][i] /= 9; } } </code></pre> <p>This eliminates all the non-sequential access completely so you no longer get random slow-downs on large powers-of-two.</p> <hr> <p><strong>Core i7 920 @ 3.5 GHz</strong></p> <p>Original code:</p> <pre><code>8191: 1.499 seconds 8192: 2.122 seconds 8193: 1.582 seconds </code></pre> <p>Interchanged Outer-Loops:</p> <pre><code>8191: 0.376 seconds 8192: 0.357 seconds 8193: 0.351 seconds </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload