Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>In your current approach, you're looking up every permutation of each substring. So for <code>"abc"</code>, you need to look up <code>"abc"</code>, <code>"acb"</code>, <code>"bac"</code>, <code>"bca"</code>, <code>"cab"</code> and <code>"cba"</code>. If you wanted to find all permutations of "permutations", your number of lookups is nearly <strong>500,000,000</strong>, and that's before you've even looked at its substrings. But we can reduce this to <strong><em>one</em></strong> lookup, regardless of length, by preprocessing the dictionary.</p> <p>The idea is to put each word in the dictionary into some data structure where each element contains a set of characters, and a list of all words containing (only) those characters. So for example, you could build a binary tree, which would have a node containing the (sorted) character set <code>"abd"</code> and the word list <code>["bad", "dab"]</code>. Now, if we want to find all permutations of <code>"dba"</code>, we sort it to give <code>"abd"</code> and look it up in the tree to retrieve the list.</p> <p>As Baumann pointed out, <a href="http://en.wikipedia.org/wiki/Trie" rel="nofollow noreferrer">tries</a> are well suited to storing this kind of data. The beauty of the trie is that the lookup time <strong>depends only on the length of your search string</strong> - it is <strong>independent of the size of your dictionary</strong>. Since you'll be storing quite a lot of words, and most of your search strings will be tiny (the majority will be the 3-character substrings from the lowest level of your recursion), this structure is ideal.</p> <p>In this case, the paths down your trie would reflect the character sets rather than the words themselves. So if your entire dictionary was <code>["bad", "dab", "cab", "cable"]</code>, your lookup structure would end up looking like this:</p> <p><img src="https://i.stack.imgur.com/o9M85.png" alt="Example trie"></p> <p>There's a bit of a time/space trade-off in the way you implement this. In the simplest (and fastest) approach, each <code>Node</code> contains just the list of words, and an array <code>Node[26]</code> of children. This allows you to locate the child you're after in constant time, just by looking at <code>children[s.charAt(i)-'a']</code> (where <code>s</code> is your search string and <code>i</code> is your current depth in the trie).</p> <p>The downside is that most of your <code>children</code> arrays will be mostly empty. If space is an issue, you can use a more compact representation like a linked list, dynamic array, hash table, etc. However, these come at the cost of potentially requiring several memory accesses and comparisons at each node, instead of the simple array access above. But I'd be surprised if the wasted space was more than a few megabytes over your whole dictionary, so the array-based approach is likely your best bet.</p> <p>With the trie in place, your whole permutation function is replaced with one lookup, bringing the complexity down from <strong>O(N! log D)</strong> (where <strong>D</strong> is the size of your dictionary, <strong>N</strong> the size of your string) to <strong>O(N log N)</strong> (since you need to sort the characters; the lookup itself is <strong>O(N)</strong>).</p> <p><strong>EDIT:</strong> I've thrown together an (untested) implementation of this structure: <a href="http://pastebin.com/Qfu93E80" rel="nofollow noreferrer">http://pastebin.com/Qfu93E80</a></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload