Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>With some variation according to which level of cache you're talking about, cache works as follows:</p> <ul> <li>if the data is already in cache then it is fast to access</li> <li>if the data is not in cache then you incur a cost, but an entire cache line (or page, if we're talking RAM vs swap file rather than cache vs RAM) is brought into cache, so access close to the missed address will not miss.</li> <li>if you're lucky then the memory subsystem will detect sequential access and pre-fetch data that it thinks you're about to need.</li> </ul> <p>So naively the questions to ask are:</p> <ol> <li>how many cache misses occur? -- B wins, because in A you fetch some unused data per record, whereas in B you fetch none other than a small rounding error at the end of the iteration. So in order to visit all of the necessary data, B fetches fewer cache lines, assuming a significant number of records. If the number of records is insignificant, then cache performance may have little or nothing to do with the performance of your code, because a program that uses a small enough amount of data will find that it's all in cache all the time.</li> <li>is the access sequential? -- yes in both cases, although this might be harder to detect in case B because there are two interleaved sequences rather than just one.</li> </ol> <p>So, I would sort of expect B to be faster <em>for this code</em>. However:</p> <ul> <li>if this is the only access to the data, then you could speed up A by removing most of the data members from the <code>struct</code>. So do that. Presumably in fact it is not the only access to the data in your program, and the other accesses might affect performance in two ways: the time they actually take, and whether they populate the cache with the data you need.</li> <li>what I expect and what actually happens are frequently different things, and there is little point relying on speculation if you have any ability to test it. In the best case, the sequential access means that there are no cache misses in either code. Testing performance <em>requires</em> no special tool (although they can make it easier), just a clock with a second hand. At a pinch, fashion a pendulum from your phone charger.</li> <li>there are some complications I have ignored. Depending on hardware, if you're unlucky with B then at the lowest cache level you could find that the accesses to one vector are evicting the accesses to the other vector, because the corresponding memory just happens to use the same location in cache. This would cause two cache misses <em>per record</em>. This will only happen on what's called "direct-mapped cache". "Two-way cache" or better would save the day, by allowing chunks of both vectors to co-exist even if their first preference location in cache is the same. I don't think that PC hardware generally uses direct-mapped cache, but I don't know for sure and I don't know much about GPUs.</li> </ul>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload