Note that there are some explanatory texts on larger screens.

plurals
  1. POwhat's the purpose of compiler barrier?
    primarykey
    data
    text
    <p>The following is excerpted from <a href="http://rads.stackoverflow.com/amzn/click/032143482X" rel="nofollow">Concurrent Programming on windows</a>, Chapter 10 Page 528~529, a c++ template Double check implementation</p> <pre><code>T getValue(){ if (!m_pValue){ EnterCriticalSection(&amp;m_crst); if (! m_pValue){ T pValue = m_pFactory(); _WriteBarrier(); m_pValue = pValue; } LeaveCriticalSection(&amp;m_crst); } _ReadBarrier(); return m_pValue; } </code></pre> <p>As the author state: </p> <blockquote> <p>A _WriteBarrier is found after instantiating the object, but before writing a pointer to it in the m_pValue field. That's required to ensure that writes in the initialization of the object never get delayed past the write to m_pValue itself.</p> </blockquote> <p>Since _WriteBarrier is compile barrier, I don't think it is useful if compiles know the semantics of LeaveCriticalSection. Compiles probably omit writing to pValue, but never optimize such that moving assignment before the function call, otherwise it would violate the program semantics. I believe LeaveCriticalSection has implicit hardware fence. And hence any writing before assignment to m_pValue will be synchronized.</p> <p>On the other hand, if compiles don't know the semantics of LeaveCriticalSection, the _WriteBarrier will be needed in <strong>all platform</strong> to prevent compiles from moving assignment out of critical section. </p> <p>And for _ReadBarrier, the author said</p> <blockquote> <p>Similarly, we need a _ReadBarrier just before returning m_value so that loads after the call to getValue are not reordered to occur before the call.</p> </blockquote> <p>First, <strong><em>if this function is included in a library, and no source code available, how do compiles know whether there is a compile barrier or not?</em></strong></p> <p>Second, it would be placed the wrong location if it is needed, I think we need place it right after EnterCriticalSection to express acquire fence. Similar with what i wrote above, it depends on whether compile understand EnterCriticalSection's semantics or not.</p> <p>And the author also said:</p> <blockquote> <p>However, I will also point out that neither fence is required on X86, Intel64, and AMD64 processors. <strong>It's unfortunate that weak processors like IA64 have muddied the waters</strong></p> </blockquote> <p>As I analysis above, if we need those barriers in certain platform, then we need them in all platform, because those barriers are compile barriers, it just make sure that compile can do the correct optimization, in case if they don't understand the semantics of some functions.</p> <p>Please correct me if I am wrong.</p> <p>Another question, is there any reference for msvc and gcc to point out which functions they understand their sync semantics? </p> <p><strong>Update 1</strong>: According to the answer(m_pValue will be accessed out of critical section), and run the sample codes from <a href="http://preshing.com/20120515/memory-reordering-caught-in-the-act" rel="nofollow">here</a>, I think:</p> <ol> <li>I think what the author mean here is the <strong>hardware fence</strong> other than <strong>compile barrier</strong>, see following quote from <a href="http://msdn.microsoft.com/en-us/library/f20w0x5e%28v=vs.80%29.aspx" rel="nofollow">MSDN</a>.</li> <li>I believe hardware fence also has implicit compile barrier(disable compile optimization), but not vice versa(see <a href="http://preshing.com/20120515/memory-reordering-caught-in-the-act" rel="nofollow">here</a>,using cpu fence will not see any reorder,but not vice versa)</li> </ol> <blockquote> <p>A Barrier is not a fence.. It should be noted that a Barrier effects everything in cache. A fence effects a single cache line.</p> <p>You should not be adding barriers unless absolutely necessary. To use a fence, you can select one of the _Interlocked intrinsic functions.</p> </blockquote> <p>As author wrote: "<em>neither fence is required on X86 Intel64, and AMD64 processors</em>", this is because those platforms just allow store-load reorder. </p> <p>There still remain a question, Does compiles understand the semantics of call to Enter/Leave critical section? if it doesn't, then it may doing optimization as in the follow answer, that will cause bad behavior.</p> <p>Thanks</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload