Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I know its an old post. but I tried the <a href="http://scikit-learn.sourceforge.net/stable/">http://scikit-learn.sourceforge.net/stable/</a> package. here is my code to find the cosine similarity. The question was how will you calculate the cosine similarity with this package and here is my code for that</p> <pre><code>from sklearn.feature_extraction.text import CountVectorizer from sklearn.metrics.pairwise import cosine_similarity from sklearn.feature_extraction.text import TfidfVectorizer f = open("/root/Myfolder/scoringDocuments/doc1") doc1 = str.decode(f.read(), "UTF-8", "ignore") f = open("/root/Myfolder/scoringDocuments/doc2") doc2 = str.decode(f.read(), "UTF-8", "ignore") f = open("/root/Myfolder/scoringDocuments/doc3") doc3 = str.decode(f.read(), "UTF-8", "ignore") train_set = ["president of India",doc1, doc2, doc3] tfidf_vectorizer = TfidfVectorizer() tfidf_matrix_train = tfidf_vectorizer.fit_transform(train_set) #finds the tfidf score with normalization print "cosine scores ==&gt; ",cosine_similarity(tfidf_matrix_train[0:1], tfidf_matrix_train) #here the first element of tfidf_matrix_train is matched with other three elements </code></pre> <p>Here suppose the query is the first element of train_set and doc1,doc2 and doc3 are the documents which I want to rank with the help of cosine similarity. then I can use this code. </p> <p>Also the tutorials provided in the question was very useful. Here are all the parts for it <a href="http://pyevolve.sourceforge.net/wordpress/?p=1589">part-I</a>,<a href="http://pyevolve.sourceforge.net/wordpress/?p=1747">part-II</a>,<a href="http://pyevolve.sourceforge.net/wordpress/?p=2497">part-III</a></p> <p>the output will be as follows :</p> <pre><code>[[ 1. 0.07102631 0.02731343 0.06348799]] </code></pre> <p>here 1 represents that query is matched with itself and the other three are the scores for matching the query with the respective documents.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload