Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy would memory access on x86 be slower when aligned to first 4 bytes of the cache line?
    primarykey
    data
    text
    <p>In writing a blog post on <a href="http://psy-lob-saw.blogspot.co.uk/2013/01/direct-memory-alignment-in-java.html" rel="nofollow">unaligned/aligned direct memory access</a> I've hit a result I struggle to explain: If my memory access is aligned to the first 4 bytes I see a measurable difference in performance for the worse when data structure fits into L1 cache. In some cases other locations are 20% faster.</p> <p>The article goes into allot more detail about the experiment and method, but here is the summary:</p> <ol> <li>Allocate a block of memory which fits into L1(32k on my laptop, use hwloc/check the spec of your cpu to find out). Align block to cacheline size(usually 64b, check your hardware). The allocation is done upfront and not measured.</li> <li>Iterate over the memory block and write a long(some value) into each cacheline in a given offset(effectively causing an unaligned write if the offset is not a multiple of 8).</li> <li>Iterate over memory block and read from same offset and verify the value is as expected.</li> </ol> <p>Why should there be any difference in performance when offset is 0-3?</p> <p>The essence of the measured code(as per request in comment):</p> <pre><code>for (address = startingAddress; address &lt; limit; address += CACHE_LINE_SIZE) { Unsafe.putLong(address, value); } for (address = startingAddress; address &lt; limit; address += CACHE_LINE_SIZE) { if (Unsafe.getLong(address) != value) throw new RuntimeException(); } </code></pre> <p>Where starting address is cache aligned + offset. Full experiment is available <a href="https://github.com/nitsanw/psy-lob-saw/blob/master/experiments/alignment/UnalignedMemoryAccessCostBenchmark.java" rel="nofollow">here</a>: </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload