Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You're probably going to want four loops - two to iterate over the blocks, and then another two to perform the transpose-copy of a single block. Assuming for simplicity a block size that divides the size of the matrix, something like this I think, although I'd want to draw some pictures on the backs of envelopes to be sure:</p> <pre><code>for (int i = 0; i &lt; n; i += blocksize) { for (int j = 0; j &lt; n; j += blocksize) { // transpose the block beginning at [i,j] for (int k = i; k &lt; i + blocksize; ++k) { for (int l = j; l &lt; j + blocksize; ++l) { dst[k + l*n] = src[l + k*n]; } } } } </code></pre> <p>An important further insight is that there's actually a cache-oblivious algorithm for this (see <a href="http://en.wikipedia.org/wiki/Cache-oblivious_algorithm" rel="noreferrer">http://en.wikipedia.org/wiki/Cache-oblivious_algorithm</a>, which uses this exact problem as an example). The informal definition of "cache-oblivious" is that you don't need to experiment tweaking any parameters (in this case the blocksize) in order to hit good/optimal cache performance. The solution in this case is to transpose by recursively dividing the matrix in half, and transposing the halves into their correct position in the destination.</p> <p>Whatever the cache size actually is, this recursion takes advantage of it. I expect there's a bit of extra management overhead compared with your strategy, which is to use performance experiments to, in effect, jump straight to the point in the recursion at which the cache really kicks in, and go no further. On the other hand, your performance experiments might give you an answer that works on your machine but not on your customers' machines.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload