Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h1>Short Answer:</h1> <p>Your <code>asd</code> array is declared as this:</p> <pre><code>int *asd=new int[16]; </code></pre> <p>Therefore, use <code>int</code> as the return type rather than <code>bool.</code><br> Alternatively, change the array type to <code>bool</code>.</p> <p><strong>In any case, make the return type of the <code>test</code> function match the type of the array.</strong></p> <p>Skip to bottom for more details.</p> <h1>Long Answer:</h1> <p>In the manually inlined version, the "core" of one iteration looks like this:</p> <pre><code>xor eax,eax mov edx,ecx and edx,0Fh mov dword ptr [ebp+edx*4],eax mov eax,dword ptr [esp+1Ch] movss xmm0,dword ptr [eax] movss xmm1,dword ptr [edi] cvtps2pd xmm0,xmm0 cvtps2pd xmm1,xmm1 comisd xmm1,xmm0 </code></pre> <p>The compiler inlined version is completely identical except for the first instruction.</p> <p>Where instead of:</p> <pre><code>xor eax,eax </code></pre> <p>it has:</p> <pre><code>xor eax,eax movzx edx,al </code></pre> <p>Okay, so it's <em>one</em> extra instruction. They both do the same - zeroing a register. This is the only difference that I see...</p> <p>The <code>movzx</code> instruction has a single-cycle latency and <code>0.33</code> cycle reciprocal throughput on all the newer architectures. So I can't imagine how this could make a 10% difference. </p> <p>In both cases, the result of the zeroing is used only 3 instructions later. So it's very possible that this could be on the critical path of execution.</p> <hr> <p><strong>While I'm not an Intel engineer, here's my guess:</strong></p> <p>Most modern processors deal with zeroing operations (such as <code>xor eax,eax</code>) via <a href="http://en.wikipedia.org/wiki/Register_renaming" rel="noreferrer">register renaming</a> to a bank of zero registers. It completely bypasses the execution units. However, it's possible that this special handling could cause a pipeline bubble when the (partial) register is accessed via <code>movzx edi,al</code>.</p> <p>Furthermore, there's also a <em>false</em> dependency on <code>eax</code> in the compiler inlined version:</p> <pre><code>movzx edx,al mov eax,ecx // False dependency on "eax". </code></pre> <p>Whether or not the <a href="http://en.wikipedia.org/wiki/Out-of-order_execution" rel="noreferrer">out-of-order execution</a> is able to resolve this is beyond me.</p> <hr> <h1>Okay, this is basically turning into a question of reverse-engineering the MSVC compiler...</h1> <p>Here I'll to explain <strong><em>why</em></strong> that extra <code>movzx</code> is generated as well as why it stays.</p> <p>The key here is the <code>bool</code> return value. Apparently, <code>bool</code> datatypes are probably as stored 8-bit values inside the MSVC internal-representation. Therefore when you implicitly convert from <code>bool</code> to <code>int</code> here:</p> <pre><code>asd[j%16] = a.test(b); ^^^^^^^^^ ^^^^^^^^^ type int type bool </code></pre> <p>there is an 8-bit -> 32-bit integer promotion. This is the reason why MSVC generates the <code>movzx</code> instruction.</p> <p>When the inlining is done manually, the compiler has enough information to optimize out this conversion and keeps everything as a 32-bit datatype IR.</p> <p>However, when the code is put into it's own function with a <code>bool</code> return value, the compiler is not able to optimize out the 8-bit intermediate datatype. Therefore, the <code>movzx</code> stays.</p> <p>When you make both datatypes the same (either <code>int</code> or <code>bool</code>), no conversion is needed. Hence the problem is avoided altogether.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload