StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>For the kinds of optimizations you are suggesting, you should write your code for clarity and and not optimize them until you have proof that they are a bottleneck.</p> <p>One danger of attempting micro-optimizations like this is that you will likely make things <em>slower</em>, because the compiler is smarter than you are a lot of the time.</p> <p>Take your "optimization":</p> <pre><code>const int windowPosX = (screenWidth * 0.5) - (windowWidth * 0.5); </code></pre> <p>There is no serious compiler in the world that doesn't know that the fastest way to divide by two is to shift right by one. Multiplying by floating-point 0.5 is actually more expensive, because it requires converting to floating-point and back, and doing two multiplies (which are more expensive than shifts).</p> <p>But don't take my word for it. Look at what the compiler actually does. gcc 4.3.3 on 32-bit Ubuntu (-O3, -msse3, -fomit-frame-pointer) compiles this:</p> <pre><code>int posx(unsigned int screen_width, unsigned int window_width) { return (screen_width / 2) - (window_width / 2); } </code></pre> <p>to this:</p> <pre><code>00000040 <posx>: 40: 8b 44 24 04 mov eax,DWORD PTR [esp+0x4] 44: 8b 54 24 08 mov edx,DWORD PTR [esp+0x8] 48: d1 e8 shr eax,1 4a: d1 ea shr edx,1 4c: 29 d0 sub eax,edx 4e: c3 </code></pre> <p>Two shifts (using an immediate operand) and a subtract. Very cheap. On the other hand, it compiles this:</p> <pre><code>int posx(unsigned int screen_width, unsigned int window_width) { return (screen_width * 0.5) - (window_width * 0.5); } </code></pre> <p>to this:</p> <pre><code>00000000 <posx>: 0: 83 ec 04 sub esp,0x4 3: 31 d2 xor edx,edx 5: 8b 44 24 08 mov eax,DWORD PTR [esp+0x8] 9: 52 push edx a: 31 d2 xor edx,edx c: 50 push eax d: df 2c 24 fild QWORD PTR [esp] 10: 83 c4 08 add esp,0x8 13: d8 0d 00 00 00 00 fmul DWORD PTR ds:0x0 15: R_386_32 .rodata.cst4 19: 8b 44 24 0c mov eax,DWORD PTR [esp+0xc] 1d: 52 push edx 1e: 50 push eax 1f: df 2c 24 fild QWORD PTR [esp] 22: d8 0d 04 00 00 00 fmul DWORD PTR ds:0x4 24: R_386_32 .rodata.cst4 28: de c1 faddp st(1),st 2a: db 4c 24 08 fisttp DWORD PTR [esp+0x8] 2e: 8b 44 24 08 mov eax,DWORD PTR [esp+0x8] 32: 83 c4 0c add esp,0xc 35: c3 ret </code></pre> <p>What you're seeing is conversion to floating-point, multiplication by a value from the data segment (which may or may not be in cache), and conversion back to integer.</p> <p>Please think of this example when you're tempted to perform micro-optimizations like this. Not only is it premature, but it might not help at all (in this case it significantly hurt!)</p> <p>Seriously: don't do it. I think a golden rule is never to do optimizations like this unless you routinely inspect your compiler's output as I have done here.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload