Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>This is an "off the top of the head" answer to the question.</p> <p>Basically, this calculates the distance sentence 2 differs from sentence 1, as a Cartesian distance from sentence 1 (assumed to be at the origin), where the distances are the sum of the minimum Levenshtein difference between the word in the 2 sentences. It has the property that 2 equal sentences give a 0 distance.</p> <p>If this approach has been published elsewhere, I'm unaware of it. </p> <pre><code>using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Diagnostics; namespace ConsoleApplication1 { class Program { static void Main(string[] args) { string str1 = "The cat sat on the mat"; string str2 = "The quick brown fox jumped over the lazy cow"; ReportDifference(str1, str1); ReportDifference(str2, str2); ReportDifference(str1, str2); ReportDifference(str2, str1); } /// &lt;summary&gt; /// Quick test andisplay routine /// &lt;/summary&gt; /// &lt;param name="str1"&gt;First sentence to test with&lt;/param&gt; /// &lt;param name="str2"&gt;Second sentence to test with&lt;/param&gt; static void ReportDifference(string str1, string str2) { Debug.WriteLine( String.Format("difference between \"{0}\" and \"{1}\" is {2}", str1, str2, Difference(str1, str2))); } /// &lt;summary&gt; /// This does the hard work. /// Basically, what it does is: /// 1) Split the stings into tokens/words /// 2) Form a cartesian product of the 2 lists of words. /// 3) Calculate the Levenshtein Distance between each word. /// 4) Group on the words from the first sentance /// 5) Get the min distance between the word in first sentence and all of the words from the second /// 6) Square the distances for each word. /// (based on the distance betwen 2 points is the sqrt of the sum of the x,y,... axises distances /// what this assumes is the first word is the origin) /// 7) take the square root of sum /// &lt;/summary&gt; /// &lt;param name="str1"&gt;sentence 1 compare&lt;/param&gt; /// &lt;param name="str2"&gt;sentence 2 compare&lt;/param&gt; /// &lt;returns&gt;distance calculated&lt;/returns&gt; static double Difference(string str1, string str2) { string[] splitters = { " " }; var a = Math.Sqrt( (from x in str1.Split(splitters, StringSplitOptions.RemoveEmptyEntries) from y in str2.Split(splitters, StringSplitOptions.RemoveEmptyEntries) select new {x, y, ld = Distance.LD(x,y)} ) .GroupBy(x =&gt; x.x) .Select(q =&gt; new { q.Key, min_match = q.Min(p =&gt; p.ld) }) .Sum(s =&gt; (double)(s.min_match * s.min_match ))); return a; } } /// &lt;summary&gt; /// Lifted from http://www.merriampark.com/ldcsharp.htm /// &lt;/summary&gt; public class Distance { /// &lt;summary&gt; /// Compute Levenshtein distance /// &lt;/summary&gt; /// &lt;param name="s"&gt;String 1&lt;/param&gt; /// &lt;param name="t"&gt;String 2&lt;/param&gt; /// &lt;returns&gt;Distance between the two strings. /// The larger the number, the bigger the difference. /// &lt;/returns&gt; public static int LD(string s, string t) { int n = s.Length; //length of s int m = t.Length; //length of t int[,] d = new int[n + 1, m + 1]; // matrix int cost; // cost // Step 1 if (n == 0) return m; if (m == 0) return n; // Step 2 for (int i = 0; i &lt;= n; d[i, 0] = i++) ; for (int j = 0; j &lt;= m; d[0, j] = j++) ; // Step 3 for (int i = 1; i &lt;= n; i++) { //Step 4 for (int j = 1; j &lt;= m; j++) { // Step 5 cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1); // Step 6 d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost); } } // Step 7 return d[n, m]; } } } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload