StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Possibility 1) This may not hold (as true) in C# but when I did optimization work for x86-64 assembler I quickly found out while benchmarking that calling code from a DLL (marked external) was slower than implementing the same exact function within my executable. The most obvious reason is paging and memory, the DLL (external) method is loaded far away in memory from the rest of the running code and if it wasn't accessed previously it'll need to be paged in. Your benchmarking code should do some warm up loops of the functions you are benchmarking to make sure they are paged in memory first before you time them.</p> <p>Possibility 2) Microsoft tends not to optimize string functions to the fullest, so out optimizing a native string length, substring, indexof etc. isn't really unheard of. Anecdote; in x86-64 assembler I was able to create a version of WinXP64's RtlInitUnicodeString function that ran 2x faster in almost all practical use cases.</p> <p>Possibility 3) Your benchmarking code shows that you're using the 2 parameter overload for IndexOf, this function likely calls the 3 parameter overload IndexOf(Char, Int32, Int32) which adds an extra overhead to each iteration.</p> <hr> <p>This may be even faster because your removing the i variable increment per iteration.</p> <pre><code> char* cp = cs + startIndex; char* cpEnd = cp + endIndex; while (cp <= cpEnd) { if (*cp == c) return cp - cs; cp++; } </code></pre> <p><em>edit</em> In reply regarding (2) for your curiosity, coded back in 2005 and used to patch the ntdll.dll of my WinXP64 machine. <a href="http://board.flatassembler.net/topic.php?t=4467" rel="nofollow">http://board.flatassembler.net/topic.php?t=4467</a></p> <pre><code>RtlInitUnicodeString_Opt: ;;rcx=buff rdx=ucharstr 77bytes xor r9d,r9d test rdx,rdx mov dword[rcx],r9d mov [rcx+8],rdx jz .end mov r8,rdx .scan: mov eax,dword[rdx] test ax,ax jz .one add rdx,4 shr eax,16 test ax,ax jz .two jmp .scan .two: add rdx,2 .one: mov eax,0fffch sub rdx,r8 cmp rdx,0fffeh cmovnb rdx,rax mov [ecx],dx add dx,2 mov [ecx+2],dx ret .end: retn </code></pre> <p><em>edit 2</em> Running your example code (updated with your fastest version) the string.IndexOf runs faster on my Intel i7, 4GB RAM, Win7 64bit. </p> <pre><code>short haystack 500k runs 2590 ms for IndexOf16 2287 ms for string.IndexOf longer haystack: 3549 ms for IndexOf16 2757 ms for string.IndexOf </code></pre> <p>Optimizations are sometimes very architecture reliant.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload