StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Horspool, KMP et al are optimal at minimizing the number of byte-comparisons. </p> <p>However, that's not the bottleneck on a modern processor. On an x86/64 processor, your string is being loaded into <em>L1 cache</em> in cache-line-width chunks (typically 64 bytes). No matter how clever your algorithm is, unless it gives you strides that are larger than that, you gain nothing; and the more complicated Horspool code is (at least one table lookup) can't compete.</p> <p>Furthermore, you're stuck with the "C" string constraint of null-termination: SOMEWHERE the code has to examine every byte.</p> <p><code>strstr()</code> is expected to be optimal for a wide range of cases; e.g. searching for tiny strings like <code>"\r\n"</code> in a short string, as well as much longer ones where some smarter algorithm might have a hope. The basic strchr/memcmp loop is pretty hard to beat across the whole range of likely inputs.</p> <p>Pretty much all x86-compatible processors since 2003 have supported SSE2. If you disassembled <code>strlen()</code>/x86 for <em>glibc</em>, you may have noticed that it uses some SSE2 PCMPEQ and MOVMASK operations to search for the null terminator 16 bytes at a time. The solution is so efficient that it beats the obvious super-simple loop, for anything longer than the empty string. </p> <p>I took that idea and came up with a <code>strstr()</code> that beats glibc's <code>strstr()</code> for all cases greater than 1 byte --- where the relative difference is pretty much moot. If you're interested, check out:</p> <ul> <li><p><a href="http://mischasan.wordpress.com/2011/07/16/convergence-sse2-and-strstr/" rel="nofollow noreferrer">Convergence SSE2 and <code>strstr()</code></a></p></li> <li><p><a href="http://mischasan.wordpress.com/2011/07/10/a-better-strstr-with-no-asm-code/" rel="nofollow noreferrer">A better <code>strstr()</code> with no ASM code</a></p> <p>If you care to see a non-SSE2 solution that dominates <code>strstr()</code> for target strings of more than 15 bytes, check out:</p> <p>which makes use of multibyte comparisons rather than <code>strchr()</code>, to find a point at which to do a memcmp.</p></li> </ul> <p>BTW you've probably figured out by now that the x86 REP SCASB/REP CMPSB ops fall on their ass for anything longer than 32 bytes, and are not much improvement for shorter strings. Wish Intel had devoted a little more attention to that, than to adding SSE4.2 "string" ops.</p> <p>For strings large enough to matter, my perf tests show that BNDM is better than Horspool, across the board. BNDM is more tolerant of "pathological" cases, such as targets that heavily repeat the last byte of a pattern. BNDM can also make use of SSE2 (128bit registers) in a way that competes with 32bit registers in efficiency and start-up cost. Source code <a href="http://mischasan.wordpress.com/2012/02/04/sse2-and-bndm-string-search/" rel="nofollow noreferrer">here</a>.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload