Note that there are some explanatory texts on larger screens.

plurals
  1. POLucene tika indexing failure
    primarykey
    data
    text
    <p>I wrote (mostly copied from lucene-in-action ebook) an indexing example using Tika. But it doesn't index the documents at all. There is no error on compile or run. I tried indexing a .pdf, .ppt, .doc, even .txt document, no use, at search returns 0 hits, and i payed attention at the words in my documents. Please take a look at the code:</p> <pre><code>public class TikaIndexer extends Indexer { private boolean DEBUG = false; static Set textualMetadataFields = new HashSet(); static { textualMetadataFields.add(Metadata.TITLE); textualMetadataFields.add(Metadata.AUTHOR); textualMetadataFields.add(Metadata.COMMENTS); textualMetadataFields.add(Metadata.KEYWORDS); textualMetadataFields.add(Metadata.DESCRIPTION); textualMetadataFields.add(Metadata.SUBJECT); } public TikaIndexer(String indexDir) throws IOException { super(indexDir); } @Override protected boolean acceptFile(File f) { return true; } @Override protected Document getDocument(File f) throws Exception { Metadata metadata = new Metadata(); metadata.set(Metadata.RESOURCE_NAME_KEY, f.getCanonicalPath()); InputStream is = new FileInputStream(f); Parser parser = new AutoDetectParser(); ContentHandler handler = new BodyContentHandler(10*1024*1024); try { parser.parse(is, handler, metadata, new ParseContext()); } finally { is.close(); } Document doc = new Document(); doc.add(new Field("contents", handler.toString(), Field.Store.NO, Field.Index.ANALYZED)); if (DEBUG) { System.out.println(" intregul textt: " + handler.toString()); } for (String name : metadata.names()) { String value = metadata.get(name); if (textualMetadataFields.contains(name)) { doc.add(new Field("contents", value, Field.Store.NO, Field.Index.ANALYZED)); } doc.add(new Field(name, value, Field.Store.YES, Field.Index.NO)); if (DEBUG) { System.out.println(" " + name + ": " + value); } } if (DEBUG) { System.out.println(); } return doc; } } </code></pre> <p>And main class:</p> <pre><code>public static void main(String args[]) { String indexDir = "src/indexDirectory"; String dataDir = "src/filesDirectory"; try { TikaConfig config = TikaConfig.getDefaultConfig(); List&lt;MediaType&gt; parsers = new ArrayList(config.getParser().getSupportedTypes(new ParseContext())); //3 Collections.sort(parsers); Iterator&lt;MediaType&gt; it = parsers.iterator(); System.out.println(parsers.size()); System.out.println("Tipuri de parsere:"); while (it.hasNext()) { System.out.println(" " + it.next()); } System.out.println(); long start = new Date().getTime(); TikaIndexer indexer = new TikaIndexer(indexDir); int numIndexed = indexer.index(dataDir); long end = new Date().getTime(); System.out.println("Indexarea a " + numIndexed + " fisiere a durat " + (end - start) + " milisecunde."); System.out.println(); System.out.println("--------------------------------------------------------------"); System.out.println(); } catch (Exception ex) { System.out.println("Nu s-a putut realiza indexarea: "); ex.printStackTrace(); Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex); } } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload