Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>First off, there is an important distinction to make here: MongoDB is a general purpose database, Elasticsearch is a distributed text search engine backed by Lucene. People have been talking about using Elasticsearch as a general purpose database but know that it was not its' original design. I think that general purpose NoSQL databases and search engines are headed for consolidation but as it stands, the two come from two very different camps.</p> <p>We are using both MongoDB and Elasticsearch in my company. We store our data in MongoDB and use Elasticsearch exclusively for its' full-text search capabilities. We only send a subset of the mongo data fields that we need to query to elastic. Our use case differs from yours in that our Mongo data changes all the time: a record, or a subset of the fields of a record, can be updated several times a day and this can call for re-indexing of that record to elastic. For that reason alone, using elastic as the sole data store is not a good option for us, as we can't update select fields; we would need to re-index a document in its' entirety. This is not an elastic limitation, this is how Lucene works, the underlying search engine behind elastic. In your case, the fact that records won't be changed once stored saves you from having to make that choice. Having said that, if data safety is a concern, I would think twice about using Elasticsearch as the only storage mechanism for your data. It may get there at some point but I'm not sure it's there yet.</p> <p>In terms of speed, not only is Elastic/Lucene on par with the querying speed of Mongo, in your case where there is "very little constant in terms of which fields are used for the filtering at any moment", it could be orders of magnitude faster, especially as the datasets become larger. The difference lies in the underlying query implementations:</p> <ul> <li>Elastic/Lucene use the <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space Model</a> and <a href="http://en.wikipedia.org/wiki/Inverted_index">inverted indexes</a> for <a href="http://en.wikipedia.org/wiki/Information_retrieval">Information Retrieval</a>, which are highly efficient ways of comparing record similarity against a query. When you query Elastic/Lucene, it already knows the answer; most of its' work lies in ranking the results for you by the most likely ones to match your query terms. This is an important point: search engines, as opposed to databases, can't guarantee you exact results; they rank results by how close they get to your query. It just so happens that most of the times, the results are close to exact.</li> <li>Mongo's approach is that of a more general purpose data store; it compares JSON documents against one another. You can get great performance out of it by all means, but you need to carefully craft your indexes to match the queries you will be running. Specifically, if you have multiple fields by which you will query, you need to carefully craft your <a href="http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeys">compound keys</a> so that they reduce the dataset that will be queried as fast as possible. E.g. your first key should filter down the majority of your dataset, your second should further filter down what left, and so on and so forth. If your queries don't match the keys and the order of those keys in the defined indexes, your performance will drop quite a bit. On the other hand, Mongo is a true database, so if accuracy is what what you need, the answers it will give will be spot on.</li> </ul> <p>For expiring old records, Elastic has a built in TTL feature. Mongo just introduced it as of version 2.2 I think.</p> <p>Since I don't know your other requirements such as expected data size, transactions, accuracy or what your filters will look like, it's hard to make any specific recommendations. Hopefully, there is enough here to get you started.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload