Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>In certain situations, I would expect foo3 to be faster as it can short circuit (some number of branches less than or equal to 4 will occur, whereas in foo2, 4 branches always occurs). In the situation where <code>s</code> is not equal to any of the 4 array elements (as is extremely likely in this case), foo2 and foo3 are basically the same code. In that case, 4 branches happen in both functions.</p> <p>Consider what foo3 really looks like (in terms of branches):</p> <pre><code>if (p[i + 0] == s) sum++; else if (p[i + 1] == s) sum++; else if (p[i + 2] == s) sum++; else if (p[i + 3] == s) sum++; </code></pre> <p>This should make it apparent that as long as the <code>if</code> keep coming up false, the sub branches are going to happen. This means that in the situation where none of the ifs are true, it will execute the same number of operations as foo2 (though not the same functionality).</p> <p>A crude way to think about it is as if each <code>if</code> has a cost (not the body of the if, the actual if). In other words, each time an <code>if</code> is reached in the execution flow, a certain cost is required. This is because a branch must be done. Thinking about it this way, it's clear to see that the cost of each function is the same when foo3's flow doesn't short circuit (when all 4 of <code>foo3</code>s <code>if</code> are encountered). (As KillianDS noted, if branch prediction is wrong, it will actually take longer for foo3 since the wrong branch will have to be rewound and the right one executed instead. It seems like for you though that the correct branch is always being chosen.) </p> <p>It's kind of like how the following snippets of code can have the same performance:</p> <pre><code>if (short_runtime()) {} </code></pre> <p>And:</p> <pre><code>if (short_runtime() &amp;&amp; long_runtime()) {} </code></pre> <p>If <code>short_runtime</code> returns true, the one with the second function call is obviously going to take longer. If the <code>short_runtime()</code> return is false though, the <code>long_runtime()</code> call will never happen, and thus the run times will be the same (or at least <em>extremely</em> similar).</p> <hr> <p>To test out this theory, you can make it so that <code>p[i + 0] == s</code> will be true. Just value initialize the array (<code>session* p = new session[SIZE]();</code>), and use <code>session s = {1, 2, 3, 4, 5};</code> locally.</p> <hr> <p>There seems to be a bit of confusion about the purpose/result of loop unrolling. It's done so that fewer jumps have to happen. If <code>n</code> things have to be done, instead of <code>n</code> iterations (jumps) happening with 1 action per iteration, you can have <code>n/k</code> iterations (jumps) happen instead. When everything can fit in the cache, this can provide a speed boost (if it can't fit in the cache, it can actually kill performance!).</p> <p>The instructions aren't happening simultaneously (if they were, <code>sum</code> would need a mutex around it which would be extremely expensive). They're simply happening in sets of 4 instead of sets of 1.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload