StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>EDIT: TL/DR version: CLR JIT will inline one level of recursion, MSVC 8 SP1 will not without <code>#pragma inline_recursion(on)</code>. And you should run the C# version outside of a debugger to get the fully optimized JIT.</p> <p>I got similar results to acidzombie24 with C# vs. C++ using VS 2008 SP1 on a Core2 Duo laptop running Vista plugged in with "high performance" power settings (~1600 ms vs. ~3800 ms). It's kind of tricky to see the optimized JIT'd C# code, but for x86 it boils down to this:</p> <pre><code>00000000 55 push ebp 00000001 8B EC mov ebp,esp 00000003 57 push edi 00000004 56 push esi 00000005 53 push ebx 00000006 8B F1 mov esi,ecx 00000008 83 FE 02 cmp esi,2 0000000b 7D 07 jge 00000014 0000000d 8B C6 mov eax,esi 0000000f 5B pop ebx 00000010 5E pop esi 00000011 5F pop edi 00000012 5D pop ebp 00000013 C3 ret return fib(n - 1) + fib(n - 2); 00000014 8D 7E FF lea edi,[esi-1] 00000017 83 FF 02 cmp edi,2 0000001a 7D 04 jge 00000020 0000001c 8B DF mov ebx,edi 0000001e EB 19 jmp 00000039 00000020 8D 4F FF lea ecx,[edi-1] 00000023 FF 15 F8 2F 12 00 call dword ptr ds:[00122FF8h] 00000029 8B D8 mov ebx,eax 0000002b 4F dec edi 0000002c 4F dec edi 0000002d 8B CF mov ecx,edi 0000002f FF 15 F8 2F 12 00 call dword ptr ds:[00122FF8h] 00000035 03 C3 add eax,ebx 00000037 8B D8 mov ebx,eax 00000039 4E dec esi 0000003a 4E dec esi 0000003b 83 FE 02 cmp esi,2 0000003e 7D 04 jge 00000044 00000040 8B D6 mov edx,esi 00000042 EB 19 jmp 0000005D 00000044 8D 4E FF lea ecx,[esi-1] 00000047 FF 15 F8 2F 12 00 call dword ptr ds:[00122FF8h] 0000004d 8B F8 mov edi,eax 0000004f 4E dec esi 00000050 4E dec esi 00000051 8B CE mov ecx,esi 00000053 FF 15 F8 2F 12 00 call dword ptr ds:[00122FF8h] 00000059 03 C7 add eax,edi 0000005b 8B D0 mov edx,eax 0000005d 03 DA add ebx,edx 0000005f 8B C3 mov eax,ebx 00000061 5B pop ebx 00000062 5E pop esi 00000063 5F pop edi 00000064 5D pop ebp 00000065 C3 ret </code></pre> <p>In contrast to the C++ generated code (/Ox /Ob2 /Oi /Ot /Oy /GL /Gr):</p> <pre><code>int fib(int n) { 00B31000 56 push esi 00B31001 8B F1 mov esi,ecx if (n < 2) return n; 00B31003 83 FE 02 cmp esi,2 00B31006 7D 04 jge fib+0Ch (0B3100Ch) 00B31008 8B C6 mov eax,esi 00B3100A 5E pop esi 00B3100B C3 ret 00B3100C 57 push edi return fib(n - 1) + fib(n - 2); 00B3100D 8D 4E FE lea ecx,[esi-2] 00B31010 E8 EB FF FF FF call fib (0B31000h) 00B31015 8D 4E FF lea ecx,[esi-1] 00B31018 8B F8 mov edi,eax 00B3101A E8 E1 FF FF FF call fib (0B31000h) 00B3101F 03 C7 add eax,edi 00B31021 5F pop edi 00B31022 5E pop esi } 00B31023 C3 ret </code></pre> <p>The C# version basically inlines <code>fib(n-1)</code> and <code>fib(n-2)</code>. For a function that is so <code>call</code> heavy, reducing the number of function calls is the key to speed. Replacing <code>fib</code> with the following:</p> <pre><code>int fib(int n); int fib2(int n) { if (n < 2) return n; return fib(n - 1) + fib(n - 2); } int fib(int n) { if (n < 2) return n; return fib2(n - 1) + fib2(n - 2); } </code></pre> <p>Gets it down to ~1900 ms. Incidentally, if I use <code>#pragma inline_recursion(on)</code> I get similar results with the original <code>fib</code>. Unrolling it one more level:</p> <pre><code>int fib(int n); int fib3(int n) { if (n < 2) return n; return fib(n - 1) + fib(n - 2); } int fib2(int n) { if (n < 2) return n; return fib3(n - 1) + fib3(n - 2); } int fib(int n) { if (n < 2) return n; return fib2(n - 1) + fib2(n - 2); } </code></pre> <p>Gets it down to ~1380 ms. Beyond that it tapers off.</p> <p>So it appears that the CLR JIT for my machine will inline recursive calls one level, whereas the C++ compiler will not do that by default.</p> <p>If only all performance critical code were like <code>fib</code>!</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload