Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>First things is to get a listing of documents. An alternative might be iterating through indexed terms, but the method <code>IndexReader.terms()</code> appears to have been removed from 4.0 (though it exists in <code>AtomicReader</code>, which could be worth looking at). The best method I'm aware of to get all documents is to simply loop through the documents by the document id:</p> <pre><code>//where reader is your IndexReader, however you go about opening/managing it for (int i=0; i&lt;reader.maxDoc(); i++) { if (reader.isDeleted(i)) continue; //operate on the document with id = i ... } </code></pre> <p>Then you need a listing of all indexed terms. I'm assuming we have no interest in stored fields, since the data you want doesn't make sense for them. For retrieving the terms you can use <code>IndexReader.getTermVectors(int)</code>. Note, I'm not actually retrieving the document, since we don't need to access it directly. Continuing from where we left off:</p> <pre><code>String field; FieldsEnum fieldsiterator; TermsEnum termsiterator; //To Simplify, you can rely on DefaultSimilarity to calculate tf and idf for you. DefaultSimilarity freqcalculator = new DefaultSimilarity() //numDocs and maxDoc are not the same thing: int numDocs = reader.numDocs(); int maxDoc = reader.maxDoc(); for (int i=0; i&lt;maxDoc; i++) { if (reader.isDeleted(i)) continue; fieldsiterator = reader.getTermVectors(i).iterator(); while (field = fieldsiterator.next()) { termsiterator = fieldsiterator.terms().iterator(); while (terms.next()) { //id = document id, field = field name //String representations of the current term String termtext = termsiterator.term().utf8ToString(); //Get idf, using docfreq from the reader. //I haven't tested this, and I'm not quite 100% sure of the context of this method. //If it doesn't work, idfalternate below should. int idf = termsiterator.docfreq(); int idfalternate = freqcalculator.idf(reader.docFreq(field, termsiterator.term()), numDocs); } } } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload