Note that there are some explanatory texts on larger screens.

plurals
  1. POC++11 atomic memory ordering - is this a correct usage of relaxed (release-consume) ordering?
    primarykey
    data
    text
    <p>I have recently made a port to C++11 using std::atomic of a triple buffer to be used as a concurrency sync mechanism. The idea behind this thread sync approach is that for a producer-consumer situation where you have a producer that is running <em>faster</em> that the consumer, triple buffering can give some benefits since the producer thread won't be "slowed" down by having to wait for the consumer. In my case, I have a physics thread that is updated at ~120fps, and a render thread that it running at ~60fps. Obviously, I want the render thread to always gets the most recent state possible, but I also know that I will be skipping a lot of frames from the physics thread, because of the difference in rates. On the other hand, I want my physics thread to maintain its constant update rate and not be limited by the slower render thread locking my data.</p> <p>The original C code was made by remis-thoughts and the full explanation is in his <a href="http://remis-thoughts.blogspot.pt/2012/01/triple-buffering-as-concurrency_30.html" rel="nofollow">blog</a>. I encourage anyone interested in reading it for a further understanding of the original implementation. </p> <p>My implementation can be found <a href="https://github.com/p4checo/triplebuffer-sync" rel="nofollow">here</a>.</p> <p>The basic idea is to have an array with 3 positions (buffers) and an atomic flag that is compare-and-swapped to define which array elements correspond to what state, at any given time. This way, only one atomic variable is used to model all 3 indexes of the array and the logic behind the triple buffering. The buffer's 3 positions are named Dirty, Clean and Snap. The <em>producer</em> always writes to the Dirty index, and can flip the writer to swap the Dirty with the current Clean index. The <em>consumer</em> can request a new Snap, which swaps the current Snap index with the Clean index to get the most recent buffer. The <em>consumer</em> always reads the buffer in the Snap position. </p> <p>The flag consists of an 8 bit unsigned int and the bits correspond to: </p> <p><strong>(unused) (new write) (2x dirty) (2x clean) (2x snap)</strong></p> <p>The newWrite extra bit flag is set by the writer and cleared by the reader. The reader can use this to check if there have been any writes since the last snap, and if not it won't take another snap. The flag and indexes can be obtained using simple bitwise operations.</p> <p><strong>Ok now for the code:</strong></p> <pre><code>template &lt;typename T&gt; class TripleBuffer { public: TripleBuffer&lt;T&gt;(); TripleBuffer&lt;T&gt;(const T&amp; init); // non-copyable behavior TripleBuffer&lt;T&gt;(const TripleBuffer&lt;T&gt;&amp;) = delete; TripleBuffer&lt;T&gt;&amp; operator=(const TripleBuffer&lt;T&gt;&amp;) = delete; T snap() const; // get the current snap to read void write(const T newT); // write a new value bool newSnap(); // swap to the latest value, if any void flipWriter(); // flip writer positions dirty / clean T readLast(); // wrapper to read the last available element (newSnap + snap) void update(T newT); // wrapper to update with a new element (write + flipWriter) private: bool isNewWrite(uint_fast8_t flags); // check if the newWrite bit is 1 uint_fast8_t swapSnapWithClean(uint_fast8_t flags); // swap Snap and Clean indexes uint_fast8_t newWriteSwapCleanWithDirty(uint_fast8_t flags); // set newWrite to 1 and swap Clean and Dirty indexes // 8 bit flags are (unused) (new write) (2x dirty) (2x clean) (2x snap) // newWrite = (flags &amp; 0x40) // dirtyIndex = (flags &amp; 0x30) &gt;&gt; 4 // cleanIndex = (flags &amp; 0xC) &gt;&gt; 2 // snapIndex = (flags &amp; 0x3) mutable atomic_uint_fast8_t flags; T buffer[3]; }; </code></pre> <p><strong>implementation:</strong></p> <pre><code>template &lt;typename T&gt; TripleBuffer&lt;T&gt;::TripleBuffer(){ T dummy = T(); buffer[0] = dummy; buffer[1] = dummy; buffer[2] = dummy; flags.store(0x6, std::memory_order_relaxed); // initially dirty = 0, clean = 1 and snap = 2 } template &lt;typename T&gt; TripleBuffer&lt;T&gt;::TripleBuffer(const T&amp; init){ buffer[0] = init; buffer[1] = init; buffer[2] = init; flags.store(0x6, std::memory_order_relaxed); // initially dirty = 0, clean = 1 and snap = 2 } template &lt;typename T&gt; T TripleBuffer&lt;T&gt;::snap() const{ return buffer[flags.load(std::memory_order_consume) &amp; 0x3]; // read snap index } template &lt;typename T&gt; void TripleBuffer&lt;T&gt;::write(const T newT){ buffer[(flags.load(std::memory_order_consume) &amp; 0x30) &gt;&gt; 4] = newT; // write into dirty index } template &lt;typename T&gt; bool TripleBuffer&lt;T&gt;::newSnap(){ uint_fast8_t flagsNow(flags.load(std::memory_order_consume)); do { if( !isNewWrite(flagsNow) ) // nothing new, no need to swap return false; } while(!flags.compare_exchange_weak(flagsNow, swapSnapWithClean(flagsNow), memory_order_release, memory_order_consume)); return true; } template &lt;typename T&gt; void TripleBuffer&lt;T&gt;::flipWriter(){ uint_fast8_t flagsNow(flags.load(std::memory_order_consume)); while(!flags.compare_exchange_weak(flagsNow, newWriteSwapCleanWithDirty(flagsNow), memory_order_release, memory_order_consume)); } template &lt;typename T&gt; T TripleBuffer&lt;T&gt;::readLast(){ newSnap(); // get most recent value return snap(); // return it } template &lt;typename T&gt; void TripleBuffer&lt;T&gt;::update(T newT){ write(newT); // write new value flipWriter(); // change dirty/clean buffer positions for the next update } template &lt;typename T&gt; bool TripleBuffer&lt;T&gt;::isNewWrite(uint_fast8_t flags){ // check if the newWrite bit is 1 return ((flags &amp; 0x40) != 0); } template &lt;typename T&gt; uint_fast8_t TripleBuffer&lt;T&gt;::swapSnapWithClean(uint_fast8_t flags){ // swap snap with clean return (flags &amp; 0x30) | ((flags &amp; 0x3) &lt;&lt; 2) | ((flags &amp; 0xC) &gt;&gt; 2); } template &lt;typename T&gt; uint_fast8_t TripleBuffer&lt;T&gt;::newWriteSwapCleanWithDirty(uint_fast8_t flags){ // set newWrite bit to 1 and swap clean with dirty return 0x40 | ((flags &amp; 0xC) &lt;&lt; 2) | ((flags &amp; 0x30) &gt;&gt; 2) | (flags &amp; 0x3); } </code></pre> <p>As you can see, I have decided to use a <em>Release-Consume</em> pattern for memory ordering. The <em>Release</em> (memory_order_release) for the store assures no writes in the current thread can be reordered <strong>after</strong> the store. On the other side, the <em>Consume</em> assures no reads in the current thread dependent on the value currently loaded can be reordered <strong>before</strong> this load. This ensures that writes to dependent variables in other threads that release the same atomic variable are visible in the current thread.</p> <p>If my understanding is correct, since I only need the flags to be atomically set, operations on other variables that don't affect directly the flags can be reordered freely by the compiler, allowing for more optimizations. From reading some documents on the new memory model, I am also aware that these relaxed atomics will only have noticeable effect on platforms such as ARM and POWER (they were introduced mainly because of them). Since I am targeting ARM, I believe that I could benefit from these operations and be able to squeeze a little bit more performance out.</p> <p>Now for the question:</p> <p><strong>Am I using correctly the Release-Consume relaxed ordering for this specific problem?</strong></p> <p>Thanks,</p> <p>André</p> <p>PS: Sorry for the long post, but I believed that some decent context was needed for a better view of the problem.</p> <p><strong>EDIT :</strong> Implemented @Yakk's suggestions:</p> <ul> <li>Fixed <code>flags</code> read on <code>newSnap()</code> and <code>flipWriter()</code> which were using direct assignment, hence using default <code>load(std::memory_order_seq_cst)</code>.</li> <li>Moved bit fiddling operations to dedicated functions for clarity.</li> <li>Added <code>bool</code> return type to <code>newSnap()</code>, now returns false when there's nothing new and true otherwise.</li> <li>Defined class as non-copyable using <code>= delete</code> idiom since both copy and assignment constructors were unsafe if the <code>TripleBuffer</code> was being used.</li> </ul> <p><strong>EDIT 2:</strong> Fixed description, which was incorrect (Thanks @Useless). It is the <em>consumer</em> that requests a new Snap and reads from the Snap index (not the "writer"). Sorry for the distraction and thanks to Useless for pointing it out.</p> <p><strong>EDIT 3:</strong> Optimized the <code>newSnap()</code> and <code>flipriter()</code> functions according to @Display Name's suggestions, effectively removing 2 redundant <code>load()</code>'s per loop cycle.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. COJust a quick code review, no answer here: `template <typename T> TripleBuffer<T>::TripleBuffer(const TripleBuffer& t){` doesn't look safe to use in most contexts if the `TripleBuffer` is in use: I'd be tempted to omit it. Same with `operator=`. `newSnap` should return `false` for "nothing to read" possibly. You should move all that bit fiddling to a single type that takes a `uint8_t&` and outputs/inputs sensible values from it. You read `flags` in `newSnap` without any explicit memory ordering guarantees.
      singulars
    2. COWow, thanks! You are right about the copy and assignment operators. They are indeed unsafe if the `TripleBuffer` is in use, since type `T` assignments are not guaranteed to be atomic and the `TripleBuffer` being copied from can change state between calls. You think it´s better to remove them and let the compiler generate them? Or do you have an idea of how to make it work safely? I am not seeing many _real_ uses for them when the `TripleBuffer` being copied from is being used. I added them for completeness but forgot to account for all possible situations.
      singulars
    3. COAbout the the `newSnap` function, it can indeed return true/false and will be a nice addition. It can signal that we are attempting to read too fast ;) Really great observation about the `flags` read in `newSnap`! As it is, it's a default `load(memory_order_seq_cst)`. It does no harm, but defeats my original purpose to use release / consume orderings :) Thanks a lot!
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload