Note that there are some explanatory texts on larger screens.

plurals
  1. POSolr approaches to re-indexing large document corpus
    primarykey
    data
    text
    <p>We are looking for some recommendations around systematically re-indexing in Solr an ever growing corpus of documents (tens of millions now, hundreds of millions in than a year) without taking the currently running index down. Re-indexing is needed on a periodic bases because:</p> <ul> <li>New features are introduced around searching the existing corpus that require additional schema fields which we can't always anticipate in advance</li> <li>The corpus is indexed across multiple shards. When it grows past a certain threshold, we need to create more shards and re-balance documents evenly across all of them (which SolrCloud does not seem to yet support).</li> </ul> <p>The current index receives very frequent updates and additions, which need to be available for search within minutes. Therefore, approaches where the corpus is re-indexed in batch offline don't really work as by the time the batch is finished, new documents will have been made available.</p> <p>The approaches we are looking into at the moment are:</p> <ul> <li>Create a new cluster of shards and batch re-index there while the old cluster is still available for searching. New documents that are not part of the re-indexed batch are sent to both the old cluster and the new cluster. When ready to switch, point the load balancer to the new cluster.</li> <li>Use CoreAdmin: spawn a new core per shard and send the re-indexed batch to the new cores. New documents that are not part of the re-indexed batch are sent to both the old cores and the new cores. When ready to switch, use CoreAdmin to dynamically swap cores.</li> </ul> <p>We'd appreciate if folks can either confirm or poke holes in either or all these approaches. Is one more appropriate than the other? Or are we completely off? Thank you in advance.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload