Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I know it's an old post, but for anyone who comes by it seeking similar algorithms, one that works particularly well is Cosine Similarity. Find a way to vectorize your records, then look for vectors with minimum angle between them. If vectorizing a record is not trivial, then you can vectorize similarity between them via some known algorithm, and then look at cosine similarity of the similarity vectors to the perfect match vector (assuming perfect matches aren't the goal since they're easy to find anyway). I get tremendous results with this matching even comparing things like lists of people in various countries working on a particular project with various contributions to the project. Vectorization implies looking at number of country matches, country mismatches, ratio of people in a matching country between two datasets, etc etc etc. I use string edit distance functions like Levenshtein distance for getting numeric value from string dissimilarities, but one could use phonetic matching, etc. As long as the target number is not 0 (vector [0 0 ... 0] is the subspace of ANY vector and thus its angle would be undefined. Sometimes to get away from the problem, such as the case of edit distance, I give a perfect match (e.d. 0) a negative weight, so that perfect matches are really emphasized. -1 and 1 are farther away than 1 and 2, which makes a lot of sense - perfect match is better than anything with even 1 misspelling.</p> <p>Cos(theta) = (A dot B) / (Norm(A)*Norm(B)) where dot is the dot-product, and Norm is the Euclidian magnitude of the vector.</p> <p>Good luck!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload