Note that there are some explanatory texts on larger screens.

plurals
  1. POTF*IDF for Search Queries
    primarykey
    data
    text
    <p>Okay, so I have been following these two posts on TF*IDF but am little confused : <a href="http://css.dzone.com/articles/machine-learning-text-feature">http://css.dzone.com/articles/machine-learning-text-feature</a></p> <p>Basically, I want to create a search query that contains searches through multiple documents. I would like to use the scikit-learn toolkit as well as the NLTK library for Python</p> <p>The problem is that I don't see where the two TF*IDF vectors come from. I need one search query and multiple documents to search. I figured that I calculate the TF*IDF scores of each document against each query and find the cosine similarity between them, and then rank them by sorting the scores in descending order. However, the code doesn't seem to come up with the right vectors.</p> <p>Whenever I reduce the query to only one search, it is returning a huge list of 0's which is really strange. </p> <p><strong>Here is the code:</strong></p> <pre><code>from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer from nltk.corpus import stopwords train_set = ("The sky is blue.", "The sun is bright.") #Documents test_set = ("The sun in the sky is bright.") #Query stopWords = stopwords.words('english') vectorizer = CountVectorizer(stop_words = stopWords) transformer = TfidfTransformer() trainVectorizerArray = vectorizer.fit_transform(train_set).toarray() testVectorizerArray = vectorizer.transform(test_set).toarray() print 'Fit Vectorizer to train set', trainVectorizerArray print 'Transform Vectorizer to test set', testVectorizerArray transformer.fit(trainVectorizerArray) print transformer.transform(trainVectorizerArray).toarray() transformer.fit(testVectorizerArray) tfidf = transformer.transform(testVectorizerArray) print tfidf.todense() </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload