Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Well this is an old topic but hopefully someone who searches for this in the future will see it. Answer given above is good but I worked through an example myself to see what's going on exactly.</p> <p>First part of the exposition is taken from wiki, the part I really wanted to elaborate on is how this backtracking array is constructed.</p> <p>Here goes:</p> <p>we work through a (relatively artificial) run of the algorithm, where </p> <pre><code>W = "ABCDABD" and S = "ABC ABCDAB ABCDABCDABDE". </code></pre> <p>At any given time, the algorithm is in a state determined by two integers:</p> <p><code>m</code> which denotes the position within S which is the beginning of a prospective match for W</p> <p><code>i</code> the index in W denoting the character currently under consideration.</p> <p>In each step we compare <code>S[m+i]</code> with <code>W[i]</code> and advance if they are equal. This is depicted, at the start of the run, like</p> <pre><code> 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456 </code></pre> <p>We proceed by comparing successive characters of W to "parallel" characters of S, moving from one to the next if they match. However, in the fourth step, we get S[3] is a space and W[3] = 'D', a mismatch. Rather than beginning to search again at S[1], we note that no 'A' occurs between positions 0 and 3 in S except at 0; hence, having checked all those characters previously, we know there is no chance of finding the beginning of a match if we check them again. Therefore we move on to the next character, setting m = 4 and i = 0.</p> <pre><code> 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456 </code></pre> <p>We quickly obtain a nearly complete match "ABCDAB" when, at W[6] (S[10]), we again have a discrepancy. However, just prior to the end of the current partial match, we passed an "AB" which could be the beginning of a new match, so we must take this into consideration. As we already know that these characters match the two characters prior to the current position, we need not check them again; we simply reset m = 8, i = 2 and continue matching the current character. Thus, not only do we omit previously matched characters of S, but also previously matched characters of W.</p> <pre><code> 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456 </code></pre> <p>This search fails immediately, however, as the pattern still does not contain a space, so as in the first trial, we return to the beginning of W and begin searching at the next character of S: m = 11, reset i = 0.</p> <pre><code> 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456 </code></pre> <p>Once again we immediately hit upon a match "ABCDAB" but the next character, 'C', does not match the final character 'D' of the word W. Reasoning as before, we set m = 15, to start at the two-character string "AB" leading up to the current position, set i = 2, and continue matching from the current position.</p> <pre><code> 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456 </code></pre> <p>This time we are able to complete the match, whose first character is S[15].</p> <p>The above example contains all the elements of the algorithm. For the moment, we assume the existence of a "partial match" table T, described below, which indicates where we need to look for the start of a new match in the event that the current one ends in a mismatch. The entries of T are constructed so that if we have a match starting at S[m] that fails when comparing S[m + i] to W[i], then the next possible match will start at index m + i - T[i] in S (that is, T[i] is the amount of "backtracking" we need to do after a mismatch). This has two implications: first, T[0] = -1, which indicates that if W[0] is a mismatch, we cannot backtrack and must simply check the next character; and second, although the next possible match will begin at index m + i - T[i], as in the example above, we need not actually check any of the T[i] characters after that, so that we continue searching from W[T[i]]. </p> <p>BACKTRACKING ARRAY CONSTRUCTION:</p> <p>so this backtracking array T[] we will call lps[], let's see how we calculate this guy</p> <pre><code>lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i]. </code></pre> <p>Examples: For the pattern “AABAACAABAA”, </p> <pre><code>lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5] </code></pre> <p>//so just going through this real quick</p> <pre><code> lps[0] is just 0 by default lps[1] is 1 because it's looking at AA and A is both a prefix and suffix lps[2] is 0 because it's looking at AAB and suffix is B but there is no prefix equal to B unless you count B itself which I guess is against the rules lps[3] is 1 because it's looking at AABA and first A matches last A lps[4] is 2 becuase it's looking at AABAA and first 2 A matches last 2 A lps[5] is 0 becuase it's looking at AABAAC and nothing matches C ... For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0] For the pattern “AAAAA”, lps[] is [0, 1, 2, 3, 4] For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3] For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4] </code></pre> <p>And this totally makes sense if you think about it...if you mismatch, you want to go back as far as you can obviously, how far back you go (the suffix portion) is essentially the prefix since you must start matching from the first character again by definition. so if your string looks like</p> <p><code>aaaaaaaaaaaaaaa..b..aaaaaaaaaaaaaaac</code> and you mismatche on the last char c, then you want to reuse <code>aaaaaaaaaaaaaaa</code> as your new head, just think it through</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload