Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat is the easiest way to implement terms association mining in Solr?
    primarykey
    data
    text
    <p><strong>Association mining</strong> seems to give good results for retrieving <strong>related terms</strong> in text corpora. There are several works on this topic including well-known <a href="http://en.wikipedia.org/wiki/Latent_semantic_analysis">LSA</a> method. The most straightforward way to mine associations is to build co-occurrence matrix of <code>docs X terms</code> and find terms that occur in the same documents most often. In my previous projects I implemented it directly in Lucene by iteration over TermDocs (I got it by calling <a href="http://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/index/IndexReader.html#termDocs%28org.apache.lucene.index.Term%29">IndexReader.termDocs(Term)</a>). But I can't see anything similar in Solr. </p> <p>So, my <em>needs</em> are:</p> <ol> <li>To retrieve the <strong>most associated terms</strong> within particular field. </li> <li>To retrieve the <strong>term, that is closest to the specified one</strong> within particular field. </li> </ol> <p>I will <em>rate answers</em> in the following way: </p> <ol> <li>Ideally I would like to find Solr's component that directly covers specified needs, that is, something to get associated terms directly. </li> <li>If this is not possible, I'm seeking for the way to get co-occurrence matrix information for specified field. </li> <li>If this is not an option too, I would like to know the most straightforward way to 1) get all terms and 2) get ids (numbers) of documents these terms occur in. </li> </ol>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload