Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>If it was just about the execution of an instruction in each thread, then the standard <code>omp parallel</code> pragma should do just that:</p> <pre><code>#pragma omp parallel for for( int y=0; y&lt;=img-&gt;height; y++ ) { [SSE Code] } #pragma omp parallel _mm_empty(); </code></pre> <p>Or combined together</p> <pre><code>#pragma omp parallel { #pragma omp for for( int y=0; y&lt;=img-&gt;height; y++ ) { [SSE Code] } #pragma omp barrier //maybe, maybe not? _mm_empty(); } </code></pre> <p>But the actual problem you are facing is the fact, that you cannot be sure that this actually executes the <code>_mm_empty</code> on each and every core that the previous loop used. You are only guaranteed to get it called in each thread after its loop (and with the barrier after all threads finished their loop), but the OpenMP runtime (or the OS) is free to schedule threads wherever it wants and whenever a reschedule occurs. But a sensible runtime/OS should indeed care for the threads to get assigned to specific cores and stay there, maybe you could even adjust that somehow with some OpenMP or OS function.</p> <p>But you know what, you probably don't need this <code>_mm_empty</code> madness at all. Keep in mind that <code>_mm_empty</code> is only needed when using <em>MMX</em> instructions, which were kind of a predecessor to <em>SSE</em> (with only 64 bits instead of 128) and used the same registers as the <em>x87</em> FPU. But <em>SSE</em> brings its own set of registers along with its own status and control flags. So <em>SSE</em> doesn't interfere with the "classical" FPU in any way and there isn't any synchronization neccessary. So if it is really only <em>SSE</em> operations that you're using and not <em>MMX</em> (i.e. you only ever worked with <code>__m128(i)</code> types and never with <code>__m64</code> types), then just forget about <code>_mm_empty</code>.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload