Note that there are some explanatory texts on larger screens.

plurals
  1. POUnderstanding how to write cache-friendly code
    text
    copied!<p>I have been trying to understand how to write the cache-friendly code. So as a first step, i was trying to understand the performance difference between array row-major access and column major access.</p> <p>So I created an int array of size 512×512 so that total size is 1MB. My L1 cache is 32KB, L2 cache is 256KB, and L3 cache is 3MB. So my array fits in L3 cache. </p> <p>I simply calculated the sum of array elements in row major order and column major order and compared their speed. All the time, column major order is slightly faster. i expected row major order to be faster than the other (may be several times faster).</p> <p>I thought problem may be due to small size of array, so I made another array of size 8192×8192 (256 MB). Still the same result.</p> <p>Below is the code snippet I used:</p> <pre><code>#include "time.h" #include &lt;stdio.h&gt; #define S 512 #define M S #define N S int main() { // Summing in the row major order int x = 0; int iter = 25000; int i, j; int k[M][N]; int sum = 0; clock_t start, end; start = clock(); while(x &lt; iter) { for (i = 0; i &lt; M; i++) { for(j = 0; j &lt; N; j++) { sum += k[i][j]; } } x++; } end = clock(); printf("%i\n", end-start); // Summing in the column major order x = 0; sum = 0; int h[M][N]; start = clock(); while(x &lt; iter) { for (j = 0; j &lt; N; j++) { for(i = 0; i &lt; M; i++){ sum += k[i][j]; } } x++; } end = clock(); printf("%i\n", end-start); } </code></pre> <p>Question : can some one tell me what is my mistake and why I am getting this result?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload