Note that there are some explanatory texts on larger screens.

plurals
  1. POHaskell tail-recursion performance question for Levenshtein distances
    primarykey
    data
    text
    <p>I'm playing around with calculating <a href="http://en.wikipedia.org/wiki/Levenshtein_distance" rel="noreferrer">Levenshtein distances</a> in Haskell, and am a little frustrated with the following performance problem. If you implement it most 'normal' way for Haskell, like below (dist), everything works just fine:</p> <pre><code>dist :: (Ord a) =&gt; [a] -&gt; [a] -&gt; Int dist s1 s2 = ldist s1 s2 (L.length s1, L.length s2) ldist :: (Ord a) =&gt; [a] -&gt; [a] -&gt; (Int, Int) -&gt; Int ldist _ _ (0, 0) = 0 ldist _ _ (i, 0) = i ldist _ _ (0, j) = j ldist s1 s2 (i+1, j+1) = output where output | (s1!!(i)) == (s2!!(j)) = ldist s1 s2 (i, j) | otherwise = 1 + L.minimum [ldist s1 s2 (i, j) , ldist s1 s2 (i+1, j) , ldist s1 s2 (i, j+1)] </code></pre> <p>But, if you bend your brain a little and implement it as dist', it executes <strong>MUCH</strong> faster (about 10x).</p> <pre><code>dist' :: (Ord a) =&gt; [a] -&gt; [a] -&gt; Int dist' o1 o2 = (levenDist o1 o2 [[]])!!0!!0 levenDist :: (Ord a) =&gt; [a] -&gt; [a] -&gt; [[Int]] -&gt; [[Int]] levenDist s1 s2 arr@([[]]) = levenDist s1 s2 [[0]] levenDist s1 s2 arr@([]:xs) = levenDist s1 s2 ([(L.length arr) -1]:xs) levenDist s1 s2 arr@(x:xs) = let n1 = L.length s1 n2 = L.length s2 n_i = L.length arr n_j = L.length x match | (s2!!(n_j-1) == s1!!(n_i-2)) = True | otherwise = False minCost = if match then (xs!!0)!!(n2 - n_j + 1) else L.minimum [(1 + (xs!!0)!!(n2 - n_j + 1)) , (1 + (xs!!0)!!(n2 - n_j + 0)) , (1 + (x!!0)) ] dist | (n_i &gt; n1) &amp;&amp; (n_j &gt; n2) = arr | n_j &gt; n2 = []:arr `seq` levenDist s1 s2 $ []:arr | n_i == 1 = (n_j:x):xs `seq` levenDist s1 s2 $ (n_j:x):xs | otherwise = (minCost:x):xs `seq` levenDist s1 s2 $ (minCost:x):xs in dist </code></pre> <p>I've tried all the usual <code>seq</code> tricks in the first version, but nothing seems to speed it up. This is a little unsatisfying for me, because I expected the first version to be <em>faster</em> because it doesn't need to evaluate the entire matrix, only the parts it needs. </p> <p>Does anyone know if it is possible to get these two implementations to perform similarly, or am I just reaping the benefits of tail-recursion optimizations in the latter, and therefore need to live with its unreadability if I want performance?</p> <p>Thanks, Orion </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload