StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I think what you are asking for is a source of semantic relationships between concepts. For that, I can think of a number of ways to go:</p> <ol> <li><strong><a href="http://en.wikipedia.org/wiki/Semantic_similarity" rel="noreferrer">Semantic similarity algorithms</a></strong>. These algorithms usually perform a tree walk over the relationships in Wordnet to come up with a real-valued score of how related two terms are. These will be limited by how well WordNet models the concepts that you are interested in. <a href="http://www.d.umn.edu/~tpederse/similarity.html" rel="noreferrer">WordNet::Similarity</a> (written in Perl) is pretty good.</li> <li><strong>Try using <a href="http://www.cyc.com/cyc/opencyc/overview" rel="noreferrer">OpenCyc</a> as a knowledge base</strong>. OpenCyc is a open-source version of Cyc, a very large knowledge base of 'real-world' facts. It should have a much richer set of sematic realtionships than WordNet does. However, I have never used OpenCyc so I can't speak to how complete it is, or how easy it is to use.</li> <li><strong>n-gram frequency analysis</strong>. As mentioned by Jeff Moser. A data-driven approach that can 'discover' relationships from large amounts of data, but can often produce noisy results.</li> <li><strong><a href="http://en.wikipedia.org/wiki/Latent_semantic_analysis" rel="noreferrer">Latent Semantic Analysis</a></strong>. A data-driven approach similar to n-gram frequency analysis that finds sets of semantically related words.</li> </ol> <p>[...]</p> <p>Judging by what you say you want to do, I think the last two options are more likely to be successful. If the relationships are not in Wordnet then semantic similarity won't work and OpenCyc doesn't seem to know much about <a href="http://sw.opencyc.org/concept/Mx4rwEu0nZwpEbGdrcN5Y29ycA" rel="noreferrer">snooker</a> other than the fact that it exists.</p> <p>I think a combination of both n-grams and LSA (or something like it) would be a good idea. N-gram frequencies will find concepts tightly bound to your target concept (e.g. tennis ball) and LSA would find related concepts mentioned in the same sentence/document (e.g. net, serve). Also, if you are only interested in nouns, filtering your output to contain only nouns or noun phrases (by using a <a href="http://en.wikipedia.org/wiki/Part-of-speech_tagging" rel="noreferrer">part-of-speech tagger</a>) might improve results.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload