Note that there are some explanatory texts on larger screens.

plurals
  1. POPerformance difference between two implementations of the same algorithm
    primarykey
    data
    text
    <p>I'm working on an application that will require the Levenshtein algorithm to calculate the similarity of two strings.</p> <p>Along time ago I adapted a C# version (which can be easily found floating around in the internet) to VB.NET and it looks like this:</p> <pre><code>Public Function Levenshtein1(s1 As String, s2 As String) As Double Dim n As Integer = s1.Length Dim m As Integer = s2.Length Dim d(n, m) As Integer Dim cost As Integer Dim s1c As Char For i = 1 To n d(i, 0) = i Next For j = 1 To m d(0, j) = j Next For i = 1 To n s1c = s1(i - 1) For j = 1 To m If s1c = s2(j - 1) Then cost = 0 Else cost = 1 End If d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost) Next Next Return (1.0 - (d(n, m) / Math.Max(n, m))) * 100 End Function </code></pre> <p>Then, trying to tweak it and improve its performance, I ended with version:</p> <pre><code>Public Function Levenshtein2(s1 As String, s2 As String) As Double Dim n As Integer = s1.Length Dim m As Integer = s2.Length Dim d(n, m) As Integer Dim s1c As Char Dim cost As Integer For i = 1 To n d(i, 0) = i s1c = s1(i - 1) For j = 1 To m d(0, j) = j If s1c = s2(j - 1) Then cost = 0 Else cost = 1 End If d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost) Next Next Return (1.0 - (d(n, m) / Math.Max(n, m))) * 100 End Function </code></pre> <p>Basically, I thought that the array of distances d(,) could be initialized inside of the main for cycles, instead of requiring two initial (and additional) cycles. I really thought this would be a huge improvement... unfortunately, not only does not improve over the original, it actually runs slower!</p> <p>I have already tried to analyze both versions by looking at the generated IL code but I just can't understand it.</p> <p>So, I was hoping that someone could shed some light on this issue and explain why the second version (even when it has fewer for cycles) runs slower than the original?</p> <p>NOTE: The time difference is about 0.15 nano seconds. This don't look like much but when you have to check thousands of millions of strings... the difference becomes quite notable.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload