Note that there are some explanatory texts on larger screens.

plurals
  1. POMicro-optimizing a c++ comparison function
    primarykey
    data
    text
    <p>I have a <code>Compare()</code> function that looks like this:</p> <pre><code>inline bool Compare(bool greater, int p1, int p2) { if (greater) return p1&gt;=p2; else return p1&lt;=p2; } </code></pre> <p>I decided to optimize to avoid branching:</p> <pre><code>inline bool Compare2(bool greater, int p1, int p2) { bool ret[2] = {p1&lt;=p2,p1&gt;=p2}; return ret[greater]; } </code></pre> <p>I then tested by doing this:</p> <pre><code>bool x = true; int M = 100000; int N = 100; bool a[N]; int b[N]; int c[N]; for (int i=0;i&lt;N; ++i) { a[i] = rand()%2; b[i] = rand()%128; c[i] = rand()%128; } // Timed the below loop with both Compare() and Compare2() for (int j=0; j&lt;M; ++j) { for (int i=0; i&lt;N; ++i) { x ^= Compare(a[i],b[i],c[i]); } } </code></pre> <p>The results:</p> <pre><code>Compare(): 3.14ns avg Compare2(): 1.61ns avg </code></pre> <p>I would say case-closed, avoid branching FTW. But for completeness, I replaced</p> <pre><code>a[i] = rand()%2; </code></pre> <p>with:</p> <pre><code>a[i] = true; </code></pre> <p>and got the exact same measurement of ~3.14ns. Presumably, there is no branching going on then, and the compiler is actually rewriting <code>Compare()</code> to avoid the <code>if</code> statement. But then, why is <code>Compare2()</code> faster?</p> <p>Unfortunately, I am assembly-code-illiterate, otherwise I would have tried to answer this myself.</p> <p><strong>EDIT</strong>: Below is some assembly:</p> <pre><code>_Z7Comparebii: .LFB4: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 movl %edi, %eax movl %esi, -8(%rbp) movl %edx, -12(%rbp) movb %al, -4(%rbp) cmpb $0, -4(%rbp) je .L2 movl -8(%rbp), %eax cmpl -12(%rbp), %eax setge %al jmp .L3 .L2: movl -8(%rbp), %eax cmpl -12(%rbp), %eax setle %al .L3: leave ret .cfi_endproc .LFE4: .size _Z7Comparebii, .-_Z7Comparebii .section .text._Z8Compare2bii,"axG",@progbits,_Z8Compare2bii,comdat .weak _Z8Compare2bii .type _Z8Compare2bii, @function _Z8Compare2bii: .LFB5: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 movl %edi, %eax movl %esi, -24(%rbp) movl %edx, -28(%rbp) movb %al, -20(%rbp) movw $0, -16(%rbp) movl -24(%rbp), %eax cmpl -28(%rbp), %eax setle %al movb %al, -16(%rbp) movl -24(%rbp), %eax cmpl -28(%rbp), %eax setge %al movb %al, -15(%rbp) movzbl -20(%rbp), %eax cltq movzbl -16(%rbp,%rax), %eax leave ret .cfi_endproc .LFE5: .size _Z8Compare2bii, .-_Z8Compare2bii .text </code></pre> <p>Now, the actual code that performs the test might be using inlined versions of the above two functions, so there is a possibility this might be the wrong code to analyze. With that said, I see a <code>jmp</code> command in <code>Compare()</code>, so I think that means that it is branching. If so, I guess this question becomes: why does the branch predictor not improve the performance of <code>Compare()</code> when I change <code>a[i]</code> from <code>rand()%2</code> to <code>true</code> (or <code>false</code> for that matter)?</p> <p><strong>EDIT2</strong>: I replaced "branch prediction" with "branching" to make my post more sensible.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload