StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>The NLTK collocations document seems pretty good to me. <a href="http://www.nltk.org/howto/collocations.html" rel="noreferrer">http://www.nltk.org/howto/collocations.html</a></p> <p>You need to give the scorer some actual sizable corpus to work with. Here is a working example using the Brown corpus built into NLTK. It takes about 30 seconds to run.</p> <pre><code>import nltk.collocations import nltk.corpus import collections bgm = nltk.collocations.BigramAssocMeasures() finder = nltk.collocations.BigramCollocationFinder.from_words( nltk.corpus.brown.words()) scored = finder.score_ngrams( bgm.likelihood_ratio ) # Group bigrams by first word in bigram. prefix_keys = collections.defaultdict(list) for key, scores in scored: prefix_keys[key[0]].append((key[1], scores)) # Sort keyed bigrams by strongest association. for key in prefix_keys: prefix_keys[key].sort(key = lambda x: -x[1]) print 'doctor', prefix_keys['doctor'][:5] print 'baseball', prefix_keys['baseball'][:5] print 'happy', prefix_keys['happy'][:5] </code></pre> <p>The output seems reasonable, works well for baseball, less so for doctor and happy.</p> <pre><code>doctor [('bills', 35.061321987405748), (',', 22.963930079491501), ('annoys', 19.009636692022365), ('had', 16.730384189212423), ('retorted', 15.190847940499127)] baseball [('game', 32.110754519752291), ('cap', 27.81891372457088), ('park', 23.509042621473505), ('games', 23.105033513054011), ("player's", 16.227872863424668)] happy [("''", 20.296341424483998), ('Spahn', 13.915820697905589), ('family', 13.734352182441569), (',', 13.55077617193821), ('bodybuilder', 13.513265447290536) </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload