Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Use <a href="http://en.wikipedia.org/wiki/Jaccard_index" rel="nofollow">Jaccard similarity</a>. In the Python demo below, keep in mind that the functions <code>cosine</code> and <code>jaccard</code> return distance, which is the "inverse" of similarity, and read the comments:</p> <pre><code># Input all the data In [19]: from scipy.spatial.distance import cosine, jaccard In [24]: a Out[24]: array([ 1, 1, 15, 2, 0]) In [25]: b Out[25]: array([ 0, 0, 15, 0, 0]) In [26]: c Out[26]: array([ 1, 1, 11, 0, 1]) # Calculate cosine similarity. I've scaled it by a factor of 100 for legibility In [20]: 100*cosine(a,b) Out[20]: 1.3072457560346473 In [21]: 100*cosine(c,a) Out[21]: 1.3267032349480568 # Note c is slightly "further away" from a than b. # Now let's see what Mr Jaccard has to say In [28]: jaccard(a,b) Out[28]: 0.75 In [29]: jaccard(a,c) Out[29]: 0.59999999999999998 # Behold the desired effect- c is now considerably closer to a than b # Sanity check- the distance between a and a is 0 In [30]: jaccard(a,a) Out[30]: 0.0 </code></pre> <p>PS Many more similarity measures exist, and each is appropriate under different circumstances. Do you have a good reason to believe <code>c</code> should be more similar to <code>a</code> than <code>b</code>? What is your task? If you want to read more about the subject, I highly recommend <a href="http://www.google.co.uk/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=1&amp;cad=rja&amp;ved=0CDcQFjAA&amp;url=http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.538&amp;rep=rep1&amp;type=pdf&amp;ei=3PtEUc_9N4bYPcaWgKgJ&amp;usg=AFQjCNEJNs9-3SJvjVi2aTwiRBVPn7VWWQ" rel="nofollow">this PhD thesis</a>. Warning: 200 pages long.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload