StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POMicro-optimizing a c++ comparison function
primarykey
Id
15770049
data
AcceptedAnswerId
15774008
AnswerCount
3
ClosedDate
CommentCount
28
CommunityOwnedDate
CreationDate
2013-04-02T17:06:23.307
FavoriteCount
3
LastActivityDate
2013-04-03T16:02:30.790
LastEditDate
2013-04-03T02:48:42.327
LastEditorUserId
543913
OwnerUserId
543913
ParentId
0
PostTypeId
1
Score
11
ViewCount
1207
LastEditorDisplayName
text
Body
I have a <code>Compare()</code> function that looks like this: <pre><code>inline bool Compare(bool greater, int p1, int p2) { if (greater) return p1>=p2; else return p1<=p2; } </code></pre> I decided to optimize to avoid branching: <pre><code>inline bool Compare2(bool greater, int p1, int p2) { bool ret[2] = {p1<=p2,p1>=p2}; return ret[greater]; } </code></pre> I then tested by doing this: <pre><code>bool x = true; int M = 100000; int N = 100; bool a[N]; int b[N]; int c[N]; for (int i=0;i<N; ++i) { a[i] = rand()%2; b[i] = rand()%128; c[i] = rand()%128; } // Timed the below loop with both Compare() and Compare2() for (int j=0; j<M; ++j) { for (int i=0; i<N; ++i) { x ^= Compare(a[i],b[i],c[i]); } } </code></pre> The results: <pre><code>Compare(): 3.14ns avg Compare2(): 1.61ns avg </code></pre> I would say case-closed, avoid branching FTW. But for completeness, I replaced <pre><code>a[i] = rand()%2; </code></pre> with: <pre><code>a[i] = true; </code></pre> and got the exact same measurement of ~3.14ns. Presumably, there is no branching going on then, and the compiler is actually rewriting <code>Compare()</code> to avoid the <code>if</code> statement. But then, why is <code>Compare2()</code> faster? Unfortunately, I am assembly-code-illiterate, otherwise I would have tried to answer this myself. EDIT: Below is some assembly: <pre><code>_Z7Comparebii: .LFB4: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 movl %edi, %eax movl %esi, -8(%rbp) movl %edx, -12(%rbp) movb %al, -4(%rbp) cmpb $0, -4(%rbp) je .L2 movl -8(%rbp), %eax cmpl -12(%rbp), %eax setge %al jmp .L3 .L2: movl -8(%rbp), %eax cmpl -12(%rbp), %eax setle %al .L3: leave ret .cfi_endproc .LFE4: .size _Z7Comparebii, .-_Z7Comparebii .section .text._Z8Compare2bii,"axG",@progbits,_Z8Compare2bii,comdat .weak _Z8Compare2bii .type _Z8Compare2bii, @function _Z8Compare2bii: .LFB5: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 movl %edi, %eax movl %esi, -24(%rbp) movl %edx, -28(%rbp) movb %al, -20(%rbp) movw $0, -16(%rbp) movl -24(%rbp), %eax cmpl -28(%rbp), %eax setle %al movb %al, -16(%rbp) movl -24(%rbp), %eax cmpl -28(%rbp), %eax setge %al movb %al, -15(%rbp) movzbl -20(%rbp), %eax cltq movzbl -16(%rbp,%rax), %eax leave ret .cfi_endproc .LFE5: .size _Z8Compare2bii, .-_Z8Compare2bii .text </code></pre> Now, the actual code that performs the test might be using inlined versions of the above two functions, so there is a possibility this might be the wrong code to analyze. With that said, I see a <code>jmp</code> command in <code>Compare()</code>, so I think that means that it is branching. If so, I guess this question becomes: why does the branch predictor not improve the performance of <code>Compare()</code> when I change <code>a[i]</code> from <code>rand()%2</code> to <code>true</code> (or <code>false</code> for that matter)? EDIT2: I replaced "branch prediction" with "branching" to make my post more sensible.
Tags
<c++><optimization><branch-prediction>
Title
Micro-optimizing a c++ comparison function
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USdshin
UserOwnerUserId
1. USdshin
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POMicro-optimizing a c++ comparison function
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POMicro-optimizing a c++ comparison function
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POMicro-optimizing a c++ comparison function
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.