Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You should accept Mark Tolonen's answer - he read the docs ;-)</p> <p>For a bit more insight, note that <code>difflib</code>'s notion of similarity has nothing to do with Levenshtein edit distance - but maybe that's what you <em>really</em> want. When you say:</p> <blockquote> <p>Note that for the string Erfreulicher, Erfreulich isn't considered a close match although the distance is only -1.</p> </blockquote> <p>I have no idea what notion of "distance" you have in mind either. The strings differ by 2 characters, right? "-1" is mysterious.</p> <p><code>difflib</code> computes a "similarity score", which is a float in the range 0.0 through 1.0. Here's how to see what it's doing internally, using your list <code>x</code>:</p> <pre><code>import difflib s = difflib.SequenceMatcher() s.set_seq2("Erfreulicher") full = [] for i in x: s.set_seq1(i) full.append((s.ratio(), i)) full.sort(reverse=True) for score, i in full: print "{:20} {:.3f}".format(i, score) </code></pre> <p>Here's the result, sorted from highest similarity score to lowest:</p> <pre><code>Erfreulicher 1.000 Erfreuliche 0.957 Erfreulicheres 0.923 Erfreulicherem 0.923 Erfreuliches 0.917 Erfreulich 0.909 Erfreulichste 0.880 Erfreulicherweis 0.857 Erfreulicherweise 0.828 </code></pre> <p>As the docs say, by default <code>get_close_matches()</code> only returns the top 3. The specific word you're asking about happens to be sixth on the list, and <em>would</em> be returned if you told the function to return the top 6 (or 7, etc) matches (see Mark's answer).</p> <p>How the score is computed is also documented. Since "Erfreulich" is a prefix of "Erfreulicher", it reduces to:</p> <pre><code>&gt;&gt;&gt; 2.0 * len("Erfreulich") / (len("Erfreulich") + len("Erfreulicher")) 0.9090909090909091 </code></pre> <p>All the strings above "Erfreulich" on the list have at least one more character in common, which makes the numerator larger. The denominator is also larger for them, but increasing the numerator by (say) 1 has a bigger effect on the result than increasing the denominator by 1. That may or may not match your intuition, but it is how it works ;-)</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload