StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHaskell tail-recursion performance question for Levenshtein distances
primarykey
Id
3831625
data
AcceptedAnswerId
3831960
AnswerCount
5
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2010-09-30T14:35:05.717
FavoriteCount
1
LastActivityDate
2015-10-27T01:54:58.727
LastEditDate
LastEditorUserId
0
OwnerUserId
462914
ParentId
0
PostTypeId
1
Score
6
ViewCount
605
LastEditorDisplayName
text
Body
I'm playing around with calculating <a href="http://en.wikipedia.org/wiki/Levenshtein_distance" rel="noreferrer">Levenshtein distances</a> in Haskell, and am a little frustrated with the following performance problem. If you implement it most 'normal' way for Haskell, like below (dist), everything works just fine: <pre><code>dist :: (Ord a) => [a] -> [a] -> Int dist s1 s2 = ldist s1 s2 (L.length s1, L.length s2) ldist :: (Ord a) => [a] -> [a] -> (Int, Int) -> Int ldist _ _ (0, 0) = 0 ldist _ _ (i, 0) = i ldist _ _ (0, j) = j ldist s1 s2 (i+1, j+1) = output where output | (s1!!(i)) == (s2!!(j)) = ldist s1 s2 (i, j) | otherwise = 1 + L.minimum [ldist s1 s2 (i, j) , ldist s1 s2 (i+1, j) , ldist s1 s2 (i, j+1)] </code></pre> But, if you bend your brain a little and implement it as dist', it executes MUCH faster (about 10x). <pre><code>dist' :: (Ord a) => [a] -> [a] -> Int dist' o1 o2 = (levenDist o1 o2 [[]])!!0!!0 levenDist :: (Ord a) => [a] -> [a] -> [[Int]] -> [[Int]] levenDist s1 s2 arr@([[]]) = levenDist s1 s2 [[0]] levenDist s1 s2 arr@([]:xs) = levenDist s1 s2 ([(L.length arr) -1]:xs) levenDist s1 s2 arr@(x:xs) = let n1 = L.length s1 n2 = L.length s2 n_i = L.length arr n_j = L.length x match | (s2!!(n_j-1) == s1!!(n_i-2)) = True | otherwise = False minCost = if match then (xs!!0)!!(n2 - n_j + 1) else L.minimum [(1 + (xs!!0)!!(n2 - n_j + 1)) , (1 + (xs!!0)!!(n2 - n_j + 0)) , (1 + (x!!0)) ] dist | (n_i > n1) && (n_j > n2) = arr | n_j > n2 = []:arr `seq` levenDist s1 s2 $ []:arr | n_i == 1 = (n_j:x):xs `seq` levenDist s1 s2 $ (n_j:x):xs | otherwise = (minCost:x):xs `seq` levenDist s1 s2 $ (minCost:x):xs in dist </code></pre> I've tried all the usual <code>seq</code> tricks in the first version, but nothing seems to speed it up. This is a little unsatisfying for me, because I expected the first version to be faster because it doesn't need to evaluate the entire matrix, only the parts it needs. Does anyone know if it is possible to get these two implementations to perform similarly, or am I just reaping the benefits of tail-recursion optimizations in the latter, and therefore need to live with its unreadability if I want performance? Thanks, Orion 
Tags
<haskell><recursion><tail><levenshtein-distance><sequencing>
Title
Haskell tail-recursion performance question for Levenshtein distances
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USjdo
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POHaskell tail-recursion performance question for Levenshtein distances
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POHaskell tail-recursion performance question for Levenshtein distances
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POHaskell tail-recursion performance question for Levenshtein distances
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COMinor style point: don't use `!!` where you can avoid it. In particular, every `someList !! 0` can be replaced with `head someList`.
 singulars
 PostPostId
 POHaskell tail-recursion performance question for Levenshtein distances
 UserUserId
 USAntal Spector-Zabusky
2. COThanks. Quick followup: is !! O(n) where n is the position you're accessing, not the length of the entire list. So `someList !! 0` should be the same as `head someList`, but `someList !! bigNumber` is O(bigNumber)?
 singulars
 PostPostId
 POHaskell tail-recursion performance question for Levenshtein distances
 UserUserId
 USjdo

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.