Note that there are some explanatory texts on larger screens.

plurals
  1. POEuclidean Distance or cosine similarity?
    primarykey
    data
    text
    <p>I was reading <a href="http://semanticvoid.com/blog/2007/02/23/similarity-measure-cosine-similarity-or-euclidean-distance-or-both/" rel="nofollow">Similarity Measure</a> and suddenly my whole world was falling apart. I have implemented a search engine using Clustering Technique. For Clustering , I used K Means which has distance measure as Euclidean distance.I also used cosine similarity to display results. I was getting amazingly accurate results.But now that i read this, what i did was normalize the document vectors and calculated the euclidean distance between two vectors and hence i have not considered magnitude anywhere.</p> <p>Am i doing something wrong ?</p> <p>Although i think that a higher term frequency would make up for a higher tf-idf value and a higher normalized tf-idf value and hence would be appropriately ranked high. Thanks</p> <p>Results( Using not normalized vectors , the figures are euclidean distances)</p> <pre><code>61.79689257425985 222Proposed Research Details.doc 144.15451315901478 and_Integrated_Assessment_of__Natural_resources_and_evolution_of_alternate_sustainable_land_management_options_for_tribal_dominated_watersheds_RRPS_24.doc 72.61392308146608 done_Developing live fencing systems for soil &amp; water conservation_NATIP-RNPS-3 SKN Math).doc 72.96125277156261 done_Management strategies for impriing rabi (SKN Math).doc 65.51734241367222 done_RPFIII_dr.dogra.doc 66.72042766100921 Evaluation of crops and their varieties (SKN Math).doc 418.8868087170988 P. VIJAYA KUMAR (DSS).doc 140.3914521621597 RPF - I PIMS-ICAR project proposal for IASRI.doc 72.95414421468679 RPF-III__Indo-US_project.doc 82.25126123574397 220Introduction and objectives.doc </code></pre> <p>Results(With normalized vectors , the figures are euclidean distances)</p> <pre><code>1.3435369899385359 222Proposed Research Details.doc 1.1277471087250086 and_Integrated_Assessment_of__Natural_resources_and_evolution_of_alternate_sustainable_land_management_options_for_tribal_dominated_watersheds_RRPS_24.doc 1.2741267093494966 done_Developing live fencing systems for soil &amp; water conservation_NATIP-RNPS-3 SKN Math).doc 1.264154265747389 done_Management strategies for impriing rabi (SKN Math).doc 1.2902191708899362 done_RPFIII_dr.dogra.doc 1.3128744973475515 Evaluation of crops and their varieties (SKN Math).doc 0.4924243033927417 P. VIJAYA KUMAR (DSS).doc 1.1747048933792805 RPF - I PIMS-ICAR project proposal for IASRI.doc 1.29150899172647 RPF-III__Indo-US_project.doc 1.318016051789028 220Introduction and objectives.doc </code></pre> <p>Results(figures are cosine similarity)</p> <pre><code>0.09745417833344654 222Proposed Research Details.doc 0.36409322938119104 and_Integrated_Assessment_of__Natural_resources_and_evolution_of_alternate_sustainable_land_management_options_for_tribal_dominated_watersheds_RRPS_24.doc 0.1883005642611103 done_Developing live fencing systems for soil &amp; water conservation_NATIP-RNPS-3 SKN Math).doc 0.2009569961963377 done_Management strategies for impriing rabi (SKN Math).doc 0.16766724553404047 done_RPFIII_dr.dogra.doc 0.13818027710720598 Evaluation of crops and their varieties (SKN Math).doc 0.8787591527140649 P. VIJAYA KUMAR (DSS).doc 0.3100342067353838 RPF - I PIMS-ICAR project proposal for IASRI.doc 0.16600226214483405 RPF-III__Indo-US_project.doc 0.13141684361322944 220Introduction and objectives.doc </code></pre> <p>The results 1 and 2 do not agree with each other while 2 and 3 strongly do. More similarity,lesser distance. The distances are taken between cluster centroid vector and the document vectors of each of the document.</p> <p>Infact the most weird result is the document with a euclidean distance of 418 and having the most similarity of 0.87. while normalized distance becomes 0.49 and agrees with similarity.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload