StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPerformance difference between two implementations of the same algorithm
primarykey
Id
12397253
data
AcceptedAnswerId
12397393
AnswerCount
3
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2012-09-12T23:04:13.060
FavoriteCount
0
LastActivityDate
2013-02-13T17:12:31.570
LastEditDate
LastEditorUserId
0
OwnerUserId
518872
ParentId
0
PostTypeId
1
Score
0
ViewCount
238
LastEditorDisplayName
text
Body
I'm working on an application that will require the Levenshtein algorithm to calculate the similarity of two strings. Along time ago I adapted a C# version (which can be easily found floating around in the internet) to VB.NET and it looks like this: <pre><code>Public Function Levenshtein1(s1 As String, s2 As String) As Double Dim n As Integer = s1.Length Dim m As Integer = s2.Length Dim d(n, m) As Integer Dim cost As Integer Dim s1c As Char For i = 1 To n d(i, 0) = i Next For j = 1 To m d(0, j) = j Next For i = 1 To n s1c = s1(i - 1) For j = 1 To m If s1c = s2(j - 1) Then cost = 0 Else cost = 1 End If d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost) Next Next Return (1.0 - (d(n, m) / Math.Max(n, m))) * 100 End Function </code></pre> Then, trying to tweak it and improve its performance, I ended with version: <pre><code>Public Function Levenshtein2(s1 As String, s2 As String) As Double Dim n As Integer = s1.Length Dim m As Integer = s2.Length Dim d(n, m) As Integer Dim s1c As Char Dim cost As Integer For i = 1 To n d(i, 0) = i s1c = s1(i - 1) For j = 1 To m d(0, j) = j If s1c = s2(j - 1) Then cost = 0 Else cost = 1 End If d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost) Next Next Return (1.0 - (d(n, m) / Math.Max(n, m))) * 100 End Function </code></pre> Basically, I thought that the array of distances d(,) could be initialized inside of the main for cycles, instead of requiring two initial (and additional) cycles. I really thought this would be a huge improvement... unfortunately, not only does not improve over the original, it actually runs slower! I have already tried to analyze both versions by looking at the generated IL code but I just can't understand it. So, I was hoping that someone could shed some light on this issue and explain why the second version (even when it has fewer for cycles) runs slower than the original? NOTE: The time difference is about 0.15 nano seconds. This don't look like much but when you have to check thousands of millions of strings... the difference becomes quite notable.
Tags
<vb.net><performance><algorithm>
Title
Performance difference between two implementations of the same algorithm
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USxfx
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.