Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to get Document ids for Document Term Vector in Lucene
    primarykey
    data
    text
    <p>I am new to Lucene world, and don't have much working knowledge of the subject. I need to extract document term vector and I found the following code online <a href="https://stackoverflow.com/questions/8776794/how-to-extract-document-term-vector-in-lucene-3-5-0/8927749#8927749">How to extract Document Term Vector in Lucene 3.5.0</a>.</p> <pre><code> /** * Sums the term frequency vector of each document into a single term frequency map * @param indexReader the index reader, the document numbers are specific to this reader * @param docNumbers document numbers to retrieve frequency vectors from * @param fieldNames field names to retrieve frequency vectors from * @param stopWords terms to ignore * @return a map of each term to its frequency * @throws IOException */ private Map&lt;String,Integer&gt; getTermFrequencyMap(IndexReader indexReader, List&lt;Integer&gt; docNumbers, String[] fieldNames, Set&lt;String&gt; stopWords) throws IOException { Map&lt;String,Integer&gt; totalTfv = new HashMap&lt;String,Integer&gt;(1024); for (Integer docNum : docNumbers) { for (String fieldName : fieldNames) { TermFreqVector tfv = indexReader.getTermFreqVector(docNum, fieldName); if (tfv == null) { // ignore empty fields continue; } String terms[] = tfv.getTerms(); int termCount = terms.length; int freqs[] = tfv.getTermFrequencies(); for (int t=0; t &lt; termCount; t++) { String term = terms[t]; int freq = freqs[t]; // filter out single-letter words and stop words if (StringUtils.length(term) &lt; 2 || stopWords.contains(term)) { continue; // stop } Integer totalFreq = totalTfv.get(term); totalFreq = (totalFreq == null) ? freq : freq + totalFreq; totalTfv.put(term, totalFreq); } } } return totalTfv; } </code></pre> <p>I have created the index which resides in the following directory.</p> <pre><code>String indexDir = "C:\\Lucene\\Output\\"; Directory dir = FSDirectory.open(new File(indexDir)); IndexReader reader = IndexReader.open(dir); </code></pre> <p>My problem is that I do not know how to get the doc ids (List docNumbers) which is required for the above mentioned function. I have tried a couple of methods like </p> <pre><code>TermDocs docs = reader.termDocs(); </code></pre> <p>but it did not work.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload