Note that there are some explanatory texts on larger screens.

plurals
  1. POLucene Indexing and searching
    text
    copied!<p>I am trying to index a table in a database using Lucene. I use Lucene just for indexing, the Fields are not stored. The table mentioned above has five columns (userid (PK), description, report number, reporttype, report). </p> <p>I intend to use a combination of userid, reportnumber and report type for getting data back from the database, if Lucene finds a hit.</p> <p>One record in the table can span multiple rows for e.g.</p> <p>JQ123, SOMEDESCRIPTION, 1, FIN, content of fin report<br> JQ123, AnotherDescription, 2, MATH, content of math report<br> JQ123, YetAnotherDesc, 3, MATH, content of another math report<br> JD456, MoreDesc, 1, STAT, content of stat report ..so on</p> <p>Some of the report types e.g. (MATH) have highly structured contents (XML, stored as string in last column) and in the future I may want to flesh out some of the content as a Field of the document.</p> <p>My strategy so far has been to create a Lucene Document for every row and index it. My thinking behind it being that <strong>1.</strong> It is easy and seems logical (to me) <strong>2.</strong> if I end up extracting contents out of certain document types and making them in to Fields, all that would be needed is an if statement that checks for report type and creates these new Fields. Here is the relevant code:</p> <pre><code>public void createDocument(){ Document luceneDocument=new Document(); luceneDocument.add(new Field("userid", userID, Field.Store.NO, Field.Index.NOT_ANALYZED)); luceneDocument.add(new Field("reportnumber", reportNum, Field.Store.NO, Field.Index.NOT_ANALYZED)); luceneDocument.add(new Field("reporttype", reportType, Field.Store.NO, Field.Index.NOT_ANALYZED)); luceneDocument.add(new Field("description", description, Field.Store.NO, Field.Index.ANALYZED)); luceneDocument.add(new Field("report", report, Field.Store.NO, Field.Index.ANALYZED)); if(reporttype.equalsIgnoreCase("MATH"){ luceneDocument.add(new Field("more fields", field content, Field.Store.NO, Field.Index.ANALYZED)); } indexwriter.add(luceneDocument) indexwriter.close } </code></pre> <p><strong>1.</strong> Does having different Documents for the same record affect Lucene's search efficiency in any fashion?<br> <strong>2.</strong> Would this approach have any significant disk space over heads when compared to having one Document per record in Lucene (I <strong>do not store</strong> any <strong>Fields</strong>)?</p> <p>Thanks in advance for your response,</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload