Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The way I've done this in the past is to store the "database" (actually a dictionary of words for a spelling correcter) as a trie.</p> <p>Then I used a branch-and-bound routine to look up nearest matching entries. For small distances, the time it takes is exponential in the distance. For large distances, it is linear in the size of the dictionary, just as you are seeing now.</p> <p>Branch-and-bound is basically a depth-first tree walk of the trie, but with an error budget. At each node, you keep track of the current levenshtein distance, and if it exceeds the budget, you prune that branch of the tree.</p> <p>First you do the walk with a budget of zero. That will only find exact matches. If you don't find a match, then you walk it with a budget of one. That will find matches at a distance of 1. If you don't find any, then you do it with a budget of 2, and so on. This sounds inefficient, but since each walk takes so much more time than the previous one, the time is dominated by the last walk that you make.</p> <p>Added: outline of code (pardon my C):</p> <pre><code>// dumb version of trie node, indexed by letter. You can improve. typedef struct tnodeTag { tnodeTag* p[128]; } tnode; tnode* top; // the top of the trie void walk(tnode* p, char* s, int budget){ int i; if (*s == 0){ if (p == NULL){ // print the current trie path } } else if (budget &gt;= 0){ // try deleting this letter walk(p, s+1, budget-1); // try swapping two adjacent letters if (s[1]){ swap(s[0], s[1]); walk(p, s, budget-1); swap(s[0], s[1]); } if (p){ for (i = 0; i &lt; 128; i++){ // try exact match if (i == *s) walk(p-&gt;p[i], s+1, budget); // try replacing this character if (i != *s) walk(p-&gt;p[i], s+1, budget-1); // try inserting this letter walk(p-&gt;p[i], s, budget-1); } } } } </code></pre> <p>Basically, you simulate deleting a letter by skipping it and searching at the same node. You simulate inserting a letter by descending the trie without advancing s. You simulate replacing a letter by acting as if the letter matched, even though it doesn't. When you get the hang of it, you can add other possible mismatches, like replacing 0 with O and 1 with L or I - dumb stuff like that.</p> <p>You probably want to add a character array argument to represent the current word you are finding in the trie.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload