Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The insight behind Boyer-Moore is that if you start searching for a pattern in a string starting with the <em>last</em> character in the pattern, you can jump your search forward multiple characters when you hit a mismatch.</p> <p>Let's say our pattern <code>p</code> is the sequence of characters <code>p1</code>, <code>p2</code>, ..., <code>pn</code> and we are searching a string <code>s</code>, currently with <code>p</code> aligned so that <code>pn</code> is at index <code>i</code> in <code>s</code>.</p> <p>E.g.:</p> <pre><code>s = WHICH FINALLY HALTS. AT THAT POINT... p = AT THAT i = ^ </code></pre> <p>The B-M paper makes the following observations:</p> <p>(1) if we try matching a character that is not in <code>p</code> then we can jump forward <code>n</code> characters:</p> <p>'F' is not in <code>p</code>, hence we advance <code>n</code> characters:</p> <pre><code>s = WHICH FINALLY HALTS. AT THAT POINT... p = AT THAT i = ^ </code></pre> <p>(2) if we try matching a character whose last position is <code>k</code> from the end of <code>p</code> then we can jump forward <code>k</code> characters:</p> <p>' 's last position in <code>p</code> is 4 from the end, hence we advance 4 characters:</p> <pre><code>s = WHICH FINALLY HALTS. AT THAT POINT... p = AT THAT i = ^ </code></pre> <p>Now we scan backwards from <code>i</code> until we either succeed or we hit a mismatch. (3a) if the mismatch occurs <code>k</code> characters from the start of <code>p</code> and the mismatched character is not in <code>p</code>, then we can advance (at least) <code>k</code> characters.</p> <p>'L' is not in <code>p</code> and the mismatch occurred against <code>p6</code>, hence we can advance (at least) 6 characters:</p> <pre><code>s = WHICH FINALLY HALTS. AT THAT POINT... p = AT THAT i = ^ </code></pre> <p>However, we can actually do better than this. (3b) since we know that at the old <code>i</code> we'd already matched some characters (1 in this case). If the matched characters don't match the start of <code>p</code>, then we can actually jump forward a little more (this extra distance is called 'delta2' in the paper):</p> <pre><code>s = WHICH FINALLY HALTS. AT THAT POINT... p = AT THAT i = ^ </code></pre> <p>At this point, observation (2) applies again, giving</p> <pre><code>s = WHICH FINALLY HALTS. AT THAT POINT... p = AT THAT i = ^ </code></pre> <p>and bingo! We're done.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload