Note that there are some explanatory texts on larger screens.

plurals
  1. POUnreasonably slow MongoDB query, even though the query is simple and aligned to indexes
    primarykey
    data
    text
    <p>I'm running a MongoDB server (that's literally all it has running). The server has 64gb of RAM and 16 cores, plus 2TB of hard drive space to work with.</p> <p><strong>The Document Structure</strong></p> <p>The database has a collection <code>domains</code> with around 20 million documents. There is a decent amount of data in each document, but for our purposes, The document is structured like so:</p> <pre><code>{ _id: "abcxyz.com", LastUpdated: &lt;date&gt;, ... } </code></pre> <p>The _id field is the domain name referenced by the document. There is an ascending index on LastUpdated. LastUpdated is updated on hundreds of thousands of records per day. Basically every time new data becomes available for a document, the document is updated and the LastUpdated field updated to the current date/time.</p> <p><strong>The Query</strong></p> <p>I have a mechanism that extracts the data from the database so it can be indexed in a Lucene index. The LastUpdated field is the key driver for flagging changes made to a document. In order to search for documents that have been changed and page through those documents, I do the following:</p> <pre><code>{ LastUpdated: { $gte: ISODate(&lt;firstdate&gt;), $lt: ISODate(&lt;lastdate&gt;) }, _id: { $gt: &lt;last_id_from_previous_page&gt; } } sort: { $_id:1 } </code></pre> <p>When no documents are returned, the start and end dates move forward and the _id "anchor" field is reset. This setup is tolerant to documents from previous pages that have had their LastUpdated value changed, i.e. the paging won't become incorrectly offset by the number of documents in previous pages that are now technically no longer in those pages.</p> <p><strong>The Problem</strong></p> <p>I want to ideally select about 25000 documents at a time, but for some reason the query itself (even when only selecting &lt;500 documents) is <em>extremely</em> slow.</p> <p>The query I ran was:</p> <pre><code>db.domains.find({ "LastUpdated" : { "$gte" : ISODate("2011-11-22T15:01:54.851Z"), "$lt" : ISODate("2011-11-22T17:39:48.013Z") }, "_id" : { "$gt" : "1300broadband.com" } }).sort({ _id:1 }).limit(50).explain() </code></pre> <p>It is so slow in fact that the explain (at the time of writing this) has been running for over 10 minutes and has not yet completed. I will update this question if it ever finishes, but the point of course is that the query is EXTREMELY slow.</p> <p>What can I do? I don't have the faintest clue what the problem might be with the query.</p> <p><strong>EDIT</strong> The explain finished after 55 minutes. Here it is:</p> <pre><code>{ "cursor" : "BtreeCursor Lastupdated_-1__id_1", "nscanned" : 13112, "nscannedObjects" : 13100, "n" : 50, "scanAndOrder" : true, "millis" : 3347845, "nYields" : 5454, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "LastUpdated" : [ [ ISODate("2011-11-22T17:39:48.013Z"), ISODate("2011-11-22T15:01:54.851Z") ] ], "_id" : [ [ "1300broadband.com", { } ] ] } } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload