Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I just addressed this exact same issue a few weeks ago. Since someone is asking now, I'll share the code. In my exhaustive tests my code is about 10x faster than the C# example on Wikipedia even when no maximum distance is supplied. When a maximum distance is supplied, this performance gain increases to 30x - 100x +. Note a couple key points for performance:</p> <ul> <li>If you need to compare the same words over and over, first convert the words to arrays of integers. The Damerau-Levenshtein algorithm includes many >, &lt;, == comparisons, and <code>ints</code> compare much faster than <code>chars</code>. </li> <li>It includes a short-circuiting mechanism to quit if the distance exceeds a provided maximum</li> <li>Use a rotating set of three arrays rather than a massive matrix as in all the implementations I've see elsewhere</li> <li>Make sure your arrays slice accross the shorter word width.</li> </ul> <p>Code (it works the exact same if you replace <code>int[]</code> with <code>String</code> in the parameter declarations:</p> <pre><code>/// &lt;summary&gt; /// Computes the Damerau-Levenshtein Distance between two strings, represented as arrays of /// integers, where each integer represents the code point of a character in the source string. /// Includes an optional threshhold which can be used to indicate the maximum allowable distance. /// &lt;/summary&gt; /// &lt;param name="source"&gt;An array of the code points of the first string&lt;/param&gt; /// &lt;param name="target"&gt;An array of the code points of the second string&lt;/param&gt; /// &lt;param name="threshold"&gt;Maximum allowable distance&lt;/param&gt; /// &lt;returns&gt;Int.MaxValue if threshhold exceeded; otherwise the Damerau-Leveshteim distance between the strings&lt;/returns&gt; public static int DamerauLevenshteinDistance(int[] source, int[] target, int threshold) { int length1 = source.Length; int length2 = target.Length; // Return trivial case - difference in string lengths exceeds threshhold if (Math.Abs(length1 - length2) &gt; threshold) { return int.MaxValue; } // Ensure arrays [i] / length1 use shorter length if (length1 &gt; length2) { Swap(ref target, ref source); Swap(ref length1, ref length2); } int maxi = length1; int maxj = length2; int[] dCurrent = new int[maxi + 1]; int[] dMinus1 = new int[maxi + 1]; int[] dMinus2 = new int[maxi + 1]; int[] dSwap; for (int i = 0; i &lt;= maxi; i++) { dCurrent[i] = i; } int jm1 = 0, im1 = 0, im2 = -1; for (int j = 1; j &lt;= maxj; j++) { // Rotate dSwap = dMinus2; dMinus2 = dMinus1; dMinus1 = dCurrent; dCurrent = dSwap; // Initialize int minDistance = int.MaxValue; dCurrent[0] = j; im1 = 0; im2 = -1; for (int i = 1; i &lt;= maxi; i++) { int cost = source[im1] == target[jm1] ? 0 : 1; int del = dCurrent[im1] + 1; int ins = dMinus1[i] + 1; int sub = dMinus1[im1] + cost; //Fastest execution for min value of 3 integers int min = (del &gt; ins) ? (ins &gt; sub ? sub : ins) : (del &gt; sub ? sub : del); if (i &gt; 1 &amp;&amp; j &gt; 1 &amp;&amp; source[im2] == target[jm1] &amp;&amp; source[im1] == target[j - 2]) min = Math.Min(min, dMinus2[im2] + cost); dCurrent[i] = min; if (min &lt; minDistance) { minDistance = min; } im1++; im2++; } jm1++; if (minDistance &gt; threshold) { return int.MaxValue; } } int result = dCurrent[maxi]; return (result &gt; threshold) ? int.MaxValue : result; } </code></pre> <p>Where <code>Swap</code> is:</p> <pre><code>static void Swap&lt;T&gt;(ref T arg1,ref T arg2) { T temp = arg1; arg1 = arg2; arg2 = temp; } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload