StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Assuming you have some hefty computational resources to throw at this, I would suggest using something simple like Hyperspace Analog of Language (HAL) to build up a Term X Term matrix for your dump of Wikipedia. Then, your algorithm could be something like:</p> <ul> <li>Given a query word/term, find it's (HAL) vector. </li> <li>For the vector, find the adjective components with the highest weights. <ul> <li>To do this efficiently, you would probably want to us a dictionary (like WordNet) to preprocess your list of terms (i.e., those extracted by HAL) such that you know (prior to processing queries) which ones could be used as adjectives. </li> </ul></li> <li>For each adjective, find the N most similar vectors in your HAL space. <ul> <li>Optional: You could narrow this list down by looking for words that co-occur across your search terms.</li> </ul></li> </ul> <p>This approach basically trades off memory and computational efficiency for simplicity in terms of code and data structures. Yet, it should do pretty well for what I think you want. The first step will give you adjectives that are most commonly associated with the query term, while the vector similarity in the HAL space (step 3) will give words that are paradigmatically related (roughly, can be substituted for one another, so if you start with an adjective of a certain sort, you should get more adjectives "like it" in terms of its relationship with the query term), which should be a fairly good proxy for the "cloud" you are looking for. </p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload