Note that there are some explanatory texts on larger screens.

plurals
  1. POLoop reordering across function boundaries in STL Algorithms
    primarykey
    data
    text
    <p>For simplicity lets assume that I have a vector of <code>N</code> matrices each of <code>M</code> rows. I am using STL <code>std::accumulate</code> to compute the sum of all the matrices. I pass a binary functor that accepts two matrices (by reference) and returns their sum (by reference). Full disclosure: I am using libstdc++ parallel mode. Inside the functor I loop over the rows individually to compute the sum.</p> <p>Though each matrix is too large to fit in the cache, a row fits in very nicely. So it would be advantageous to re-order the loops so that the outer loop indexes over the <code>M</code> rows and the inner one over the <code>N</code> matrices. In addition to defining the functor inline, is there anything else I can do to encourage such a cross-function-boundary loop re-ordering. I can of course restructure the code, but I would ideally like to keep the simple structure that the use of STL algorithms afford. If there is something that is gcc specific I wouldnt mind that either.</p> <p>I am not actually dealing with matrices, that was just an example, but the same problem structure applies. The main issue is that of performance. Explaining the actual scenario would be too cumbersome, but the core problem is this: STL's accumulate entails an ordering among the nested loops that isnt very cache friendly because it tries to complete the addition of two objects before moving on to the next object. A single object is too big to be held in cache, but parts of it can be. So the execution can be sped up if one computes the 'additions' one 'part' at a time (over all objects). Hand reordering the loops leads to substantial improvement in FLOPS. But I would ideally like the compiler to do the re-ordering so that I can code at the STL level (as far as possible). So I am looking for tricks to do this.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload