StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POUse scikit-learn to classify into multiple categories
text
Body
copied!<p>I'm trying to use one of scikit-learn's supervised learning methods to classify pieces of text into one or more categories. The predict function of all the algorithms I tried just returns one match.</p> <p>For example I have a piece of text:</p> <pre><code>"Theaters in New York compared to those in London" </code></pre> <p>And I have trained the algorithm to pick a place for every text snippet I feed it.</p> <p>In the above example I would want it to return <code>New York</code> and <code>London</code>, but it only returns <code>New York</code>.</p> <p>Is it possible to use scikit-learn to return multiple results? Or even return the label with the next highest probability? </p> <p>Thanks for your help.</p> <p>---Update </p> <p>I tried using <code>OneVsRestClassifier</code> but I still only get one option back per piece of text. Below is the sample code I am using</p> <pre><code>y_train = ('New York','London') train_set = ("new york nyc big apple", "london uk great britain") vocab = {'new york' :0,'nyc':1,'big apple':2,'london' : 3, 'uk': 4, 'great britain' : 5} count = CountVectorizer(analyzer=WordNGramAnalyzer(min_n=1, max_n=2),vocabulary=vocab) test_set = ('nice day in nyc','london town','hello welcome to the big apple. enjoy it here and london too') X_vectorized = count.transform(train_set).todense() smatrix2 = count.transform(test_set).todense() base_clf = MultinomialNB(alpha=1) clf = OneVsRestClassifier(base_clf).fit(X_vectorized, y_train) Y_pred = clf.predict(smatrix2) print Y_pred </code></pre> <p>Result: ['New York' 'London' 'London']</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload