Note that there are some explanatory texts on larger screens.

plurals
  1. POFastest x86 assembly code to synchronize access to an array?
    text
    copied!<p>What is the fastest x86 assembly code to synchronize access to an array in memory?</p> <p>To be more precise: We have a malloc'ed continuous single-paged region in memory and the OS will not page-out this region for the duration of our experiment. One thread will write to the array, one thread will read from the array. the array is small, but larger than the atomic-write capability of your cpu (so that a separate lock is acutally required)</p> <p>"fastest": the effective speed: Do not just assume the length of bytecode is significant but take into account the caching behavior of the lock and branching behavior regarding surrounding code.</p> <p>It has to work on x86-32 and/or x86-64</p> <p>It has to work on-top of (or descendents of) Windows since XP, Linux since kernel 2.2, or MaxOs X (in user-mode).</p> <p>Please no "it depends"-responses: If it depends on anything I have not specified here just make up your own example(s) and state what is fastest in that/those case(s).</p> <p>Post code! (This is to prevent vague descriptions)</p> <p>Post not only your 2-line <code>LOCK</code> + <code>CMPXCHG</code> compare&amp;swap but show us how you integrate it with the read instructions in the one thread and the write-instructions in the other. </p> <p>If you like, explain your tweaks for cache-optimality and how to avoid branch-mispredictions if the branch-target is dependant on (1) whether you get the lock or not (2) what the first byte of a larger-read is.</p> <p>If you like distinguish between multiprocessing and task-switching: how will your code perform if the threads are not performed on 2 cpus but just get hold of one?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload