Note that there are some explanatory texts on larger screens.

plurals
  1. POMemory barrier use in lock-free queues
    primarykey
    data
    text
    <p>I recently read Paul McKenney's 2010 white paper, <a href="http://www.liblfds.org/mediawiki/images/a/a7/Michael%2C_Scott_-_Simple%2C_Fast%2C_and_Practical_Non-Blocking_and_Blocking_Concurrent_Queue_Algorithms.pdf" rel="noreferrer">"Memory Barriers: a Hardware View for Software Hackers"</a>.</p> <p>I would very much appreciate some feedback / comments / guidance with regard to a small sections of C code, given below, which implement the M&amp;S queue enqueue function, in particular with regard to memory and compiler barriers.</p> <p>This code is using pointer-counter pairs to handle <a href="http://en.wikipedia.org/wiki/ABA_problem" rel="noreferrer">ABA</a> and for the sake of this post should be considered as written for and only for x86/x64.</p> <p>The inline comments are all written now, for this post, and are part of this post in that they express what I currently think I think.</p> <p>For brevity, I've stripped the code of asserts and cache line padding in the structs, etc.</p> <p>Currently, I think the code is quite broken, but I want to make sure I think so for the right reasons.</p> <hr> <pre><code>#define PAC_SIZE 2 struct lfds_queue_element { struct lfds_queue_element *volatile next[PAC_SIZE]; void *user_data; }; struct lfds_queue_state { struct lfds_queue_element *volatile enqueue[PAC_SIZE]; struct lfds_queue_element *volatile dequeue[PAC_SIZE]; atom_t volatile aba_counter; }; void lfds_queue_internal_dcas_pac_enqueue( struct lfds_queue_state *lqs, struct lfds_queue_element *lqe ) { ALIGN(ALIGN_DOUBLE_POINTER) struct lfds_queue_element *local_lqe[PAC_SIZE], *enqueue[PAC_SIZE], *next[PAC_SIZE]; unsigned char cas_result = 0; unsigned int backoff_iteration = 1; /* TRD : here we have been passed a new element to place into the queue; we initialize it and its next pointer/counter pair */ local_lqe[POINTER] = lqe; local_lqe[COUNTER] = (struct lfds_queue_element *) lfds_abstraction_atomic_increment( &amp;lqs-&gt;aba_counter ); local_lqe[POINTER]-&gt;next[POINTER] = NULL; local_lqe[POINTER]-&gt;next[COUNTER] = (struct lfds_queue_element *) lfds_abstraction_atomic_increment( &amp;lqs-&gt;aba_counter ); /* TRD : now, I think there is a issue here, in that these values are by no means yet necessarily visible to other cores however, they only need to be visible once the element has entered the queue, and for that to happen, the contigious double-word CAS must have occurred - and on x86/x64, this carries with it an mfence however, that mfence will only act to empty our local store buffer - it will not cause other cores to flush their invalidation queues, so surely it can all still go horribly wrong? ah, but if all other cores are only accessing these variables using atomic operations, they too will be issuing mfences and so at that point flushing their invalidate queues */ do { enqueue[COUNTER] = lqs-&gt;enqueue[COUNTER]; enqueue[POINTER] = lqs-&gt;enqueue[POINTER]; next[COUNTER] = enqueue[POINTER]-&gt;next[COUNTER]; next[POINTER] = enqueue[POINTER]-&gt;next[POINTER]; /* TRD : now, this is interesting we load the enqueue pointer and its next pointer we then (immediately below) check to see they're unchanged but this check is totally bogus! we could be reading old values from our cache, where our invalidate queue has not been processed, so the initial read contains old data *and* we then go ahead and check from our cache the same old values a second time what's worse is that I think we could also read the correct values for enqueue but an incorrect (old) value for its next pointer...! so, in either case, we easily mistakenly pass the if() and then enter into code which does things to the queue now, in both cases, the CAS will mfence, which will cause us to see from the data structure the true values, but how much will that help us - we need to look to see what is actually being done the if() checks next[POINTER] is NULL if we have read a NULL for next, then we think the enqueue pointer is correcly placed (it's not lagging behind) so we don't need to help; we then try to add our element to the end of the queue now, it may be we have read enqueue properly but next improperly and so we now try to add our element where it will in fact truncate the queue! the CAS however will mfence and so at this point we will actually see the correct value for enqueue-next, and this will prevent that occurring if we look now at the else clause, here we have seen that enqueue-&gt;next is not NULL, so the enqueue pointer is out of place and we need to help, which we do by moving it down the queue here though we could have read enqueue correctly but next incorrectly; the CAS will mfence, which will update the cache, but since we're only comparing the enqueue pointer with our copy of the enqueue pointer, the fact our next pointer is wrong won't change! so here, we move the enqueue pointer to some old element - which although it might be in the queue (so this would be an inefficiency, you'd have to do a bunch more queue walking to get the enqueue pointer to the final element) it might not be, too! it could in the meantime have been dequeued and that of course would be death */ if( lqs-&gt;enqueue[POINTER] == enqueue[POINTER] and lqs-&gt;enqueue[COUNTER] == enqueue[COUNTER] ) { if( next[POINTER] == NULL ) { local_lqe[COUNTER] = next[COUNTER] + 1; cas_result = lfds_abstraction_atomic_dcas_with_backoff( (atom_t volatile *) enqueue[POINTER]-&gt;next, (atom_t *) local_lqe, (atom_t *) next, &amp;backoff_iteration ); } else { next[COUNTER] = enqueue[COUNTER] + 1; lfds_abstraction_atomic_dcas( (atom_t volatile *) lqs-&gt;enqueue, (atom_t *) next, (atom_t *) enqueue ); } } } while( cas_result == 0 ); local_lqe[COUNTER] = enqueue[COUNTER] + 1; lfds_abstraction_atomic_dcas( (atom_t volatile *) lqs-&gt;enqueue, (atom_t *) local_lqe, (atom_t *) enqueue ); return; } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload