Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy the performance difference between C# (quite a bit slower) and Win32/C?
    primarykey
    data
    text
    <p>We are looking to migrate a performance critical application to .Net and find that the c# version is 30% to 100% slower than the Win32/C depending on the processor (difference more marked on mobile T7200 processor). I have a very simple sample of code that demonstrates this. For brevity I shall just show the C version - the c# is a direct translation:</p> <pre><code>#include "stdafx.h" #include "Windows.h" int array1[100000]; int array2[100000]; int Test(); int main(int argc, char* argv[]) { int res = Test(); return 0; } int Test() { int calc,i,k; calc = 0; for (i = 0; i &lt; 50000; i++) array1[i] = i + 2; for (i = 0; i &lt; 50000; i++) array2[i] = 2 * i - 2; for (i = 0; i &lt; 50000; i++) { for (k = 0; k &lt; 50000; k++) { if (array1[i] == array2[k]) calc = calc - array2[i] + array1[k]; else calc = calc + array1[i] - array2[k]; } } return calc; } </code></pre> <p>If we look at the disassembly in Win32 for the 'else' we have:</p> <pre><code>35: else calc = calc + array1[i] - array2[k]; 004011A0 jmp Test+0FCh (004011bc) 004011A2 mov eax,dword ptr [ebp-8] 004011A5 mov ecx,dword ptr [ebp-4] 004011A8 add ecx,dword ptr [eax*4+48DA70h] 004011AF mov edx,dword ptr [ebp-0Ch] 004011B2 sub ecx,dword ptr [edx*4+42BFF0h] 004011B9 mov dword ptr [ebp-4],ecx </code></pre> <p>(this is in debug but bear with me)</p> <p>The disassembly for the optimised c# version using the CLR debugger on the optimised exe:</p> <pre><code> else calc = calc + pev_tmp[i] - gat_tmp[k]; 000000a7 mov eax,dword ptr [ebp-4] 000000aa mov edx,dword ptr [ebp-8] 000000ad mov ecx,dword ptr [ebp-10h] 000000b0 mov ecx,dword ptr [ecx] 000000b2 cmp edx,dword ptr [ecx+4] 000000b5 jb 000000BC 000000b7 call 792BC16C 000000bc add eax,dword ptr [ecx+edx*4+8] 000000c0 mov edx,dword ptr [ebp-0Ch] 000000c3 mov ecx,dword ptr [ebp-14h] 000000c6 mov ecx,dword ptr [ecx] 000000c8 cmp edx,dword ptr [ecx+4] 000000cb jb 000000D2 000000cd call 792BC16C 000000d2 sub eax,dword ptr [ecx+edx*4+8] 000000d6 mov dword ptr [ebp-4],eax </code></pre> <p>Many more instructions, presumably the cause of the performance difference.</p> <p>So 3 questions really:</p> <ol> <li><p>Am I looking at the correct disassembly for the 2 programs or are the tools misleading me?</p></li> <li><p>If the difference in the number of generated instructions is not the cause of the difference what is? </p></li> <li><p>What can we possibly do about it other than keep all our performance critical code in a native DLL.</p></li> </ol> <p>Thanks in advance Steve</p> <p>PS I did receive an invite recently to a joint MS/Intel seminar entitled something like 'Building performance critical native applications' Hmm...</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload