StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POExecution time for loops
text
Body
copied!<p>I'm analysing and measuring and getting different results fom my analysis and the measurement. The code is two loops with a data cache with a size of 512 bytes and a block size of 32 bytes:</p> <pre><code>int SumByColRow (int matrix[M][M], int size) { int i, j, Sum = 0; for (j = 0; j < size; j ++) { for (i = 0; i < size; i ++) { Sum += matrix[i][j]; } } return Sum; } int SumByRowCol (int matrix[M][M], int size) { int i, j, Sum = 0; for (i = 0; i < size; i ++) { for (j = 0; j < size; j ++) { Sum += matrix[i][j]; } } return Sum; } </code></pre> <p>I think it should be faster not to switch rows in the inner loop since C stores matrices by row and therefore the SumByRowCol should be faster but in measurement it is the other way. I thought that it would be faster when the cache due to the principle of spatial locality can make the inner loops faster since the values are from consecutive elements? What is the reason that in fact the execution times when measured it is measured that SumByColRow actually is faster?</p> <pre><code>SumByColRow: Result: 31744 6415.29 us(641529 ticks) SumByRowCol: Result: 31744 7336.47 us(733647 ticks) </code></pre> <h2>Update</h2> <p>I ran the program again making sure that I'm actually using the data cache and this time the result as as expected, so the above result might be a coincidence and the following is more like it:</p> <pre><code>SumByColRow: Result: 31744 5961.13 us(596113 ticks) SumByRowCol: Result: 31744 2328.89 us(232889 ticks) </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload