StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>If you want to parallelize indexing, there are two things you can do:</p> <ul> <li>parallelizing calls to addDocument,</li> <li>increasing the maximum thread count of your merge scheduler.</li> </ul> <p>You are on the right path to parallelize calls to addDocuments, but spawning one thread per document will not scale as the number of documents you need to index will grow. You should rather use a fixed-size <a href="http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html" rel="noreferrer">ThreadPoolExecutor</a>. Since this task is mainly CPU-intensive (depending on your analyzer and the way you retrieve your data), setting the number of CPUs of your computer as the maximum number of threads might be a good start.</p> <p>Regarding the merge scheduler, you can increase the maximum number of threads which can be used with the <a href="http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/index/ConcurrentMergeScheduler.html#setMaxThreadCount%28int%29" rel="noreferrer">setMaxThreadCount method of ConcurrentMergeScheduler</a>. Beware that disks are much better at sequential reads/writes than random read/writes, as a consequence setting a too high maximum number of threads to your merge scheduler is more likely to slow indexing down than to speed it up.</p> <p>But before trying to parallelizing your indexing process, you should probably try to find where the bottleneck is. If your disk is too slow, the bottleneck is likely to be the flush and the merge steps, as a consequence parallelizing calls to addDocument (which essentially consists in analyzing a document and buffering the result of the analysis in memory) will not improve indexing speed at all.</p> <p>Some side notes:</p> <ul> <li><p>There is some ongoing work in the development version of Lucene in order to improve indexing parallelism (the flushing part especially, this <a href="http://www.searchworkings.org/blog/-/blogs/lucene-indexing-gains-concurrency" rel="noreferrer">blog entry</a> explains how it works).</p></li> <li><p>Lucene has a nice wiki page on <a href="http://wiki.apache.org/lucene-java/ImproveIndexingSpeed" rel="noreferrer">how to improve indexing speed</a> where you will find other ways to improve indexing speed.</p></li> </ul>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload