Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>If all you want is a fast, persistent key-value store for a non-enormous dataset, Lucene probably isn't the best solution- Berkeley DB would be the obvious choice. That said, Grant Ingersoll gave a talk at this year's Lucene Revolution conference about exactly this. He intentionally came at the question with a pro-Lucene bias, and got into a back-and-forth with several audience members about what contemporary document databases (like CouchDB) provide that Lucene doesn't. For any non-huge dataset that might eventually need secondary indexes, I think this is a great solution. Lucene's performance for key/value lookups won't be quite as fast Berkeley DB, CouchDB, Tokyo Tyrant or the like, but it's still quite speedy, more than adequate for many apps. I think he measured roughly 50ms for a key/value lookup on a recent laptop. And if later on you need to add secondary indexes (as it seems like you might on the results of a web crawl), you'll have a much easier time with Lucene than with those products.</p> <p>Other tools, like BDB, will be simpler to code for than Lucene. But if that's a concern, just use Solr, which makes it easy to add docs and search via simple HTTP calls (you'll want to modify the fields in the schema.xml config file, but otherwise, Solr should be ready-to-use out of the box). </p> <p>Now, if your dataset is too big to reasonably fit on one machine, distributed key-value stores, like Project Voldemort or Riak, might be easier to setup and administer. But Lucene will get you pretty far on one machine, especially if you aren't indexing many fields- at least a TB, I'd guess.</p> <p>If you do use Lucene, I'd think hard about whether there truly aren't any properties other than the key you'd like to search by- might as well get them stored the first time, since Lucene makes it easy.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload