Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy does compiler inlining produce slower code than manual inlining?
    primarykey
    data
    text
    <h2>Background</h2> <p>The following critical loop of a piece of numerical software, written in C++, basically compares two objects by one of their members:</p> <pre><code>for(int j=n;--j&gt;0;) asd[j%16]=a.e&lt;b.e; </code></pre> <p><code>a</code> and <code>b</code> are of class <code>ASD</code>:</p> <pre><code>struct ASD { float e; ... }; </code></pre> <p>I was investigating the effect of putting this comparison in a lightweight member function:</p> <pre><code>bool test(const ASD&amp; y)const { return e&lt;y.e; } </code></pre> <p>and using it like this:</p> <pre><code>for(int j=n;--j&gt;0;) asd[j%16]=a.test(b); </code></pre> <p>The compiler is inlining this function, but the problem is, that the assembly code will be different and cause >10% of runtime overhead. I have to question:</p> <h2>Questions</h2> <ol> <li><p>Why is the compiler prodrucing different assembly code?</p></li> <li><p>Why is the produced assembly slower?</p></li> </ol> <p><strong>EDIT:</strong> The second question has been answered by implementing @KamyarSouri's suggestion (j%16). The assembly code now looks almost identical (see <a href="http://pastebin.com/diff.php?i=yqXedtPm" rel="noreferrer">http://pastebin.com/diff.php?i=yqXedtPm</a>). The only differences are the lines 18, 33, 48:</p> <pre><code>000646F9 movzx edx,dl </code></pre> <h2>Material</h2> <ul> <li>The test code: <a href="http://pastebin.com/03s3Kvry" rel="noreferrer">http://pastebin.com/03s3Kvry</a></li> <li>The assembly output on MSVC10 with /Ox /Ob2 /Ot /arch:SSE2: <ul> <li>Compiler inlined version: <a href="http://pastebin.com/yqXedtPm" rel="noreferrer">http://pastebin.com/yqXedtPm</a></li> <li>Manually inlined version: <a href="http://pastebin.com/pYSXL77f" rel="noreferrer">http://pastebin.com/pYSXL77f</a></li> <li>Difference <a href="http://pastebin.com/diff.php?i=yqXedtPm" rel="noreferrer">http://pastebin.com/diff.php?i=yqXedtPm</a></li> </ul></li> </ul> <p>This chart shows the FLOP/s (up to a scaling factor) for 50 testruns of my code.</p> <p><img src="https://i.stack.imgur.com/BlGeJ.png" alt="enter image description here"></p> <p>The gnuplot script to generate the plot: <a href="http://pastebin.com/8amNqya7" rel="noreferrer">http://pastebin.com/8amNqya7</a></p> <p>Compiler Options:</p> <p>/Zi /W3 /WX- /MP /Ox /Ob2 /Oi /Ot /Oy /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm- /EHsc /MT /GS- /Gy /arch:SSE2 /fp:precise /Zc:wchar_t /Zc:forScope /Gd /analyze-</p> <p>Linker Options: /INCREMENTAL:NO "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /ALLOWISOLATION /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X86 /ERRORREPORT:QUEUE </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload