Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>What kind of access pattern is using your 7-point stencil? If you're having cache coherence problems, this is the first question to ask -- if the access pattern of your central (x,y,z) coordinate is completely random, you may be out of luck.</p> <p>If you have some control over the access pattern, you can try to adjust it to be more cache-friendly. If not, then you should consider what kind of access pattern to expect; you may be able to arrange the data so that this access pattern is more benign. A combination of these two can sometimes be very effective.</p> <p>There is a particular data arrangement that is frequently useful for this kind of thing: bit-interleaved array layout. Assume (for simplicity) that the size of each coordinate is a power of two. Then, a "normal" layout will build the index by concatenating the bits for each coordinate. However, a bit-interleaved layout will allocate bits to each dimension in a round-robin fashion:</p> <pre><code>3D index coords: (xxxx, yyyy, zzzz) normal index: data[zzzzyyyyxxxx] (x-coord has least-significant bits, then y) bit-interleaved: data[zyxzyxzyxzyx] (lsb are now relatively local) </code></pre> <p>Practically speaking, there is a minor cost: instead of multiplying the the coordinates by their step values, you will need to use a lookup table to find your offsets. But since you will probably only need very short lookup tables (especially for a 3D array!), they should all fit nicely into cache.</p> <pre><code>3D coords: (x,y,z) normal index: data[x + y*ystep + z*zstep] where: ystep= xsize (possibly aligned-up, if not a power of 2?) zsetp= ysize * ystep bit-interleaved: data[xtab[x] + ytab[y] + ztab[z]] where: xtab={ 0, 1, 8, 9, 64, 65, 72, 73,512...} (x has bits 0,3,6,9...) ytab={ 0, 2, 16, 18,128,130,144,146,1024...} (y has bits 1,4,7,10...) ztab={ 0, 4, 32, 36,256,260,288,292,2048...} (y has bits 2,5,8,11...) </code></pre> <p>Ultimately, whether this is any use depends entirely on the requirements of your algorithm. But, again, please note that if your algorithm is too demanding of your cache, you may want to look into adjusting the algorithm, instead of just the layout.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. COYou are right-on with the normal indexing that you have listed there. That is exactly how my data is lined up. I am not sure of the possible values of xsize or ysize though. Possibly NOT a power of two though. For the "small" test cases I am running xsize seems to be 21. The data in the original array is accessed sequentially, not randomly. So when I update the n'th point, i can very well expect the n+1 point to use n and n+2 as it's x coordinates, etc. In your example, what are those x/y/ztab values? The actual data values? I'm not quite getting how the interleaved data is accessed.
      singulars
    2. COIn my example, the data is assumed to be stored in a large 1-dimensional array. The x/y/ztab values are index offsets, used to calculate the location in data[] that the element is actually stored in. This is probably the fastest way to do bit-interleaving: rather than do manual bit-shifting every time you access an element, you precompute the index offsets for each coordinate, look them up (from very short tables that don't take much cache room), and add them together. (I will answer your other questions in subsequent comments...)
      singulars
    3. COIf your access pattern is strictly sequential, in the order that it is stored in memory, this is the best possible case. The 7-point stencil spreads the memory accesses out over 5 locations in the array -- but the data loaded from memory should be used efficiently, *if* your access pattern is in fact in memory-sequential order (i.e., your inner loop should be over your fastest-changing coordinate, the next-inner is over your next-fastest coordinate, etc.). If your data is larger than your cache, you will necessarily have *some* cache misses...
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload