Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Let's assume your words are bounded at a reasonable size (let's say 10 letters). Do the following to achieve linear(!) time complexity, that is, <code>O(A+B)</code>:</p> <ul> <li>Initialize a hashtable or trie</li> <li>For each string b in B: <ul> <li>For every substring of that string <ul> <li>Add the substring to the hashtable/trie (this is no worse than <code>55*O(B)</code>=<code>O(B)</code>), with metadata of which string it belonged to</li> </ul></li> </ul></li> <li>For each string a in A: <ul> <li>Do an <code>O(1)</code> query to your hashtable/trie to find all B-strings it is in, yield those</li> </ul></li> </ul> <p>(As of writing this answer, no response yet if OP's "words" are bounded. If they are unbounded, this solution still applies, but there is a dependency of <code>O(maxwordsize^2)</code>, though actually it's nicer in practice since not all words are the same size, so it might be as nice as <code>O(averagewordsize^2)</code> with the right distribution. For example if all the words were of size 20, the problem size would grow by a factor of 4 more than if they were of size 10. But if sufficiently few words were increased from size 10->20, then the complexity wouldn't change much.)</p> <p><strong>edit:</strong> <a href="https://stackoverflow.com/q/8289199/711085">https://stackoverflow.com/q/8289199/711085</a> is actually a theoretically better answer. I was looking at the linked Wikipedia page before that answer was posted, and was thinking "linear in the string size is not what you want", and only later realized it's exactly what you want. Your intuition to build a regexp <code>(Aword1|Aword2|Aword3|...)</code> is correct since the finite-automaton which is generated behind the scenes will perform matching quickly IF it supports simultaneous overlapping matches, which not all regexp engines might. Ultimately what you should use depends on if you plan to reuse the As or Bs, or if this is just a one-time thing. The above technique is much easier to implement but only works if your words are bounded (and introduces a DoS vulnerability if you don't reject words above a certain size limit), but may be what you are looking for if you don't want the <a href="http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm" rel="nofollow noreferrer">Aho-Corasick string matching finite automaton</a> or similar, or it is unavailable as a library.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload