Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy does the position of a function in a c++ file affect its performance
    primarykey
    data
    text
    <p>Why does the position of a function in a c++ file affect its performance? Specifically in the example given below we have two identical functions that have different, consistent performance profiles. How does one go about investigating this and determining why the performance is so different?</p> <p>The example is pretty straightforward in that we have two functions: a and b. Each is run many times in a tight loop and optimised (<code>-O3 -march=corei7-avx</code>) and timed. Here is the code:</p> <pre><code>#include &lt;cstdint&gt; #include &lt;iostream&gt; #include &lt;numeric&gt; #include &lt;boost/timer/timer.hpp&gt; bool array[] = {true, false, true, false, false, true}; uint32_t __attribute__((noinline)) a() { asm(""); return std::accumulate(std::begin(array), std::end(array), 0); } uint32_t __attribute__((noinline)) b() { asm(""); return std::accumulate(std::begin(array), std::end(array), 0); } const size_t WARM_ITERS = 1ull &lt;&lt; 10; const size_t MAX_ITERS = 1ull &lt;&lt; 30; void test(const char* name, uint32_t (*fn)()) { std::cout &lt;&lt; name &lt;&lt; ": "; for (size_t i = 0; i &lt; WARM_ITERS; i++) { fn(); asm(""); } boost::timer::auto_cpu_timer t; for (size_t i = 0; i &lt; MAX_ITERS; i++) { fn(); asm(""); } } int main(int argc, char **argv) { test("a", a); test("b", b); return 0; } </code></pre> <p>Some notable features:</p> <ul> <li>Function a and b are identical. They perform the same accumulate operation and compile down to the same assembly instructions.</li> <li>Each test iteration has a warm up period before the timing starts to try and eliminate any issues with warming up caches.</li> </ul> <p>When this is compiled and run we get the following output showing a is significantly slower than b:</p> <pre><code>[me@host:~/code/mystery] make &amp;&amp; ./mystery g++-4.8 -c -g -O3 -Wall -Wno-unused-local-typedefs -std=c++11 -march=corei7-avx -I/usr/local/include/boost-1_54/ mystery.cpp -o mystery.o g++-4.8 mystery.o -lboost_system-gcc48-1_54 -lboost_timer-gcc48-1_54 -o mystery a: 7.412747s wall, 7.400000s user + 0.000000s system = 7.400000s CPU (99.8%) b: 5.729706s wall, 5.740000s user + 0.000000s system = 5.740000s CPU (100.2%) </code></pre> <p>If we invert the two tests (i.e. call <code>test(b)</code> and then <code>test(a)</code>) a is still slower than b:</p> <pre><code>[me@host:~/code/mystery] make &amp;&amp; ./mystery g++-4.8 -c -g -O3 -Wall -Wno-unused-local-typedefs -std=c++11 -march=corei7-avx -I/usr/local/include/boost-1_54/ mystery.cpp -o mystery.o g++-4.8 mystery.o -lboost_system-gcc48-1_54 -lboost_timer-gcc48-1_54 -o mystery b: 5.733968s wall, 5.730000s user + 0.000000s system = 5.730000s CPU (99.9%) a: 7.414538s wall, 7.410000s user + 0.000000s system = 7.410000s CPU (99.9%) </code></pre> <p>If we now invert the location of the functions in the C++ file (move the definition of b above a) the results are inverted and a becomes faster than b!</p> <pre><code>[me@host:~/code/mystery] make &amp;&amp; ./mystery g++-4.8 -c -g -O3 -Wall -Wno-unused-local-typedefs -std=c++11 -march=corei7-avx -I/usr/local/include/boost-1_54/ mystery.cpp -o mystery.o g++-4.8 mystery.o -lboost_system-gcc48-1_54 -lboost_timer-gcc48-1_54 -o mystery a: 5.729604s wall, 5.720000s user + 0.000000s system = 5.720000s CPU (99.8%) b: 7.411549s wall, 7.420000s user + 0.000000s system = 7.420000s CPU (100.1%) </code></pre> <p>So essentially whichever function is at the top of the c++ file is slower.</p> <p>Some answers to questions you may have:</p> <ul> <li>The code compiled is identical for both a and b. The disassembly has been checked. (For those interested: <a href="http://pastebin.com/2QziqRXR">http://pastebin.com/2QziqRXR</a>)</li> <li>The code was compiled using gcc 4.8, gcc 4.8.1 on ubuntu 13.04, ubuntu 13.10, and ubuntu 12.04.03.</li> <li>Effects observed on an Intel Sandy Bridge i7-2600 and Intel Xeon X5482 cpus.</li> </ul> <p>Why would this be happening? What tools are available to investigate something like this?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload