Note that there are some explanatory texts on larger screens.

plurals
  1. PODoes volatile qualifier cancel caching for this memory?
    primarykey
    data
    text
    <p>In this article: <a href="http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2" rel="nofollow noreferrer">http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2</a> says, that we can't do any optimization for <code>volatile</code>, even such as (where: <code>volatile int&amp; v = *(address);</code>):</p> <pre><code>v = 1; // C: write to v local = v; // D: read from v </code></pre> <p>can't be optimized to this:</p> <pre><code>v = 1; // C: write to v local = 1; // D: read from v // but it can be done for std::atomic&lt;&gt; </code></pre> <p>It is can't be done, because between 1st and 2nd lines may <code>v</code> value be changed by <strong>hardware device (not CPU where can't work cache coherence: network adapter, GPU, FPGA, etc...)</strong> (sequentila/concurrency), which mapped to this memory location. But it is make sense only if <code>v</code> can't be cached in CPU-cache L1/2/3, because for usual (non-<code>volatile</code>) variable between 1st and 2nd line too small time and is likely to trigger cached. </p> <p>Does <code>volatile</code> qualifier guarantees no caching for this memory location?</p> <p><strong>ANSWER:</strong> </p> <ol> <li>No, <code>volatile</code> <strong>doesn't guarantee no caching</strong> for this memory location, and there aren't anything about this in C/C++ Standards or <a href="http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html" rel="nofollow noreferrer">compiler manual</a>.</li> <li>Using memory mapped region, <strong><a href="https://stackoverflow.com/a/1757198/1558037">when memory mapped from device memory to CPU-memory is already marked as WC</a></strong> (write combining) instead of WB, that cancels the caching. <strong>And need not to do cache-flushing</strong>.</li> <li>An opposite, if CPU-memory mapped to the device memory, then incidentally, the controller PCIE, located on crystal of CPU, is snooping for data which going through DMA from this device, and updates(invalidate) CPU-cache L3. In this case, <strong>if the executable code on the device using the</strong> <code>volatile</code> tries to perform the same two lines, it also cancels the cache memory of the device (e.g. in the cache GPU-L2). And <strong><a href="https://stackoverflow.com/questions/12027849/how-can-i-read-from-the-pinned-lock-page-ram-and-not-from-the-cpu-cache-use/12028433#12028433">need not to do GPU-cache-flushing and need not to do CPU-cache-flushing</a></strong>. Also for CPU might need to use <code>std::atomic_thread_fence(std::memory_order_seq_cst);</code> <a href="http://en.wikipedia.org/wiki/Direct_memory_access#Cache_coherency" rel="nofollow noreferrer">if L3-cache(LLC) coherency with DMA over PCIE, but L1/L2 is not</a>. And for nVidia CUDA we can use: <a href="http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-fence-functions" rel="nofollow noreferrer"><code>void __threadfence_system();</code></a></li> <li>We <strong>need to flushing DMA-controllers-cache</strong>, when sending unaligned data: <a href="http://msdn.microsoft.com/en-us/library/windows/hardware/ff545924%28v=vs.110%29.aspx" rel="nofollow noreferrer">(WDK: <code>KeFlushIoBuffers(), FlushAdapterBuffers()</code>)</a></li> <li>Also, we can mark any memory region as uncached as WC-marked by yourself via the MTRR registers.</li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload