Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to extract Document Term Vector in Lucene 3.5.0
    primarykey
    data
    text
    <p>I am using Lucene 3.5.0 and I want to output term vectors of each document. For example I want to know the frequency of a term in all documents and in each specific document. My indexing code is:</p> <pre><code>import java.io.FileFilter; import java.io.FileReader; import java.io.IOException; import java.io.File; import java.io.FileReader; import java.io.BufferedReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.document.Field; import org.apache.lucene.document.Document; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class Indexer { public static void main(String[] args) throws Exception { if (args.length != 2) { throw new IllegalArgumentException("Usage: java " + Indexer.class.getName() + " &lt;index dir&gt; &lt;data dir&gt;"); } String indexDir = args[0]; String dataDir = args[1]; long start = System.currentTimeMillis(); Indexer indexer = new Indexer(indexDir); int numIndexed; try { numIndexed = indexer.index(dataDir, new TextFilesFilter()); } finally { indexer.close(); } long end = System.currentTimeMillis(); System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds"); } private IndexWriter writer; public Indexer(String indexDir) throws IOException { Directory dir = FSDirectory.open(new File(indexDir)); writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_35), true, IndexWriter.MaxFieldLength.UNLIMITED); } public void close() throws IOException { writer.close(); } public int index(String dataDir, FileFilter filter) throws Exception { File[] files = new File(dataDir).listFiles(); for (File f: files) { if (!f.isDirectory() &amp;&amp; !f.isHidden() &amp;&amp; f.exists() &amp;&amp; f.canRead() &amp;&amp; (filter == null || filter.accept(f))) { BufferedReader inputStream = new BufferedReader(new FileReader(f.getName())); String url = inputStream.readLine(); inputStream.close(); indexFile(f, url); } } return writer.numDocs(); } private static class TextFilesFilter implements FileFilter { public boolean accept(File path) { return path.getName().toLowerCase().endsWith(".txt"); } } protected Document getDocument(File f, String url) throws Exception { Document doc = new Document(); doc.add(new Field("contents", new FileReader(f))); doc.add(new Field("urls", url, Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("fullpath", f.getCanonicalPath(), Field.Store.YES, Field.Index.NOT_ANALYZED)); return doc; } private void indexFile(File f, String url) throws Exception { System.out.println("Indexing " + f.getCanonicalPath()); Document doc = getDocument(f, url); writer.addDocument(doc); } } </code></pre> <p>can anybody help me in writing a program to do that? thanks.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload