Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>TSP isn't a great way to think about this problem. Let n be the length of the text and m be the length of the query; assume n > m. The naive solution</p> <pre><code>best = infinity for i = 1 to n for j = i to n all_found = true for k = 1 to m found = false for l = i to j if text[l] == query[k] found = true all_found = all_found || found if all_found &amp;&amp; j - i &lt; best best = j - i best_i = i best_j = j </code></pre> <p>is already polynomial-time at O(n<sup>3</sup> m) for bounded-length words. Now let's optimize.</p> <p>First, hoist the inner loop via a hash set.</p> <pre><code>best = infinity for i = 1 to n for j = i to n subtext_set = {} for l = i to j subtext_set = subtext_set union {text[l]} all_found = true for k = 1 to m all_found = all_found &amp;&amp; query[k] in subtext_set if all_found &amp;&amp; j - i &lt; best best = j - i best_i = i best_j = j </code></pre> <p>The running time is now O(n<sup>3</sup>), or O(n<sup>3</sup> log n) if we use a binary tree instead.</p> <p>Observe now that it's wasteful to recompute <code>subtext_set</code> when the upper bound increases by one.</p> <pre><code>best = infinity for i = 1 to n subtext_set = {} for j = i to n subtext_set = subtext_set union {text[l]} all_found = true for k = 1 to m all_found = all_found &amp;&amp; query[k] in subtext_set if all_found &amp;&amp; j - i &lt; best best = j - i best_i = i best_j = j </code></pre> <p>We're at O(n<sup>2</sup> m). Now it seems wasteful to recheck the entire query when <code>subtext_set</code> is augmented by just one element: why don't we just check that one, and remember how many we have to go?</p> <pre><code>query_set = {} for k = 1 to m query_set = query_set union {query[k]} best = infinity for i = 1 to n subtext_set = {} num_found = 0 for j = i to n if text[l] in query_set &amp;&amp; text[l] not in subtext_set subtext_set = subtext_set union {text[l]} num_found += 1 if num_found == m &amp;&amp; j - i &lt; best best = j - i best_i = i best_j = j </code></pre> <p>We're at O(n<sup>2</sup>). Getting to O(n) requires a couple of insights. First, let's look at how many query words each substring contains for the example</p> <pre><code>text = Bar has a computer at home. Bar 1 2 3 4 5 6 7 query = Bar computer a # j 1 2 3 4 5 6 7 i +-------------- 1 | 1 1 2 3 3 3 3 2 | 0 0 1 2 2 2 3 3 | 0 0 1 2 2 2 3 4 | 0 0 0 1 1 1 2 5 | 0 0 0 0 0 0 1 6 | 0 0 0 0 0 0 1 7 | 0 0 0 0 0 0 1 </code></pre> <p>This matrix has non-increasing columns and non-decreasing rows, and that's true in general. We want to traverse the underside of the entries with value m, because further in corresponds to a longer solution. The algorithm is the following. If the current i, j have all of the query words, then increase i; otherwise, increase j.</p> <p>With our current data structures, increasing j is fine but increasing i is not, because our data structures don't support deletion. Instead of a set, we need to keep a multi-set and decrement <code>num_found</code> when the last copy of a query word disappears.</p> <pre><code>best = infinity count = hash table whose entries are zero by default for k = 1 to m count[query[k]] = -1 num_found = 0 i = 1 j = 0 while true if num_found == m if j - i &lt; best best = j - i best_i = i best_j = j count[text[i]] -= 1 if count[text[i]] == -1 num_found -= 1 i += 1 else j += 1 if j &gt; n break if count[text[j]] == -1 num_found += 1 count[text[j]] += 1 </code></pre> <p>We've arrived at O(n). The last asymptotically relevant optimization is to reduce the extra space usage from O(n) to O(m) by storing counts only for elements in the query. I'll leave that one as an exercise. (Also, some more care must be taken to handle empty queries.)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload