Note that there are some explanatory texts on larger screens.

plurals
  1. POdb.collection.count() returns a lot more documents for sharded collection in MongoDB
    primarykey
    data
    text
    <p>I have 2 shards with replication sets (3 instances each). When I do <code>count()</code> on a sharded collection, I get a lot more than the real number of documents (more than 2.5 millions documents difference). Same when I just do <code>find()</code> and incrementing counter in <code>forEach()</code> loop.</p> <p>How do I know real number of documents? First of all, I know the trend of increase, i.e. it can not increase so radically. Secondly, when I count documents with the following M/R script, I get real number of documents (as I assume). I use this script to see duplicate documents. Number of duplicates is several thousands not millions. And the count on <code>test_duplicate_collection</code> minus duplicates gives me real number of documents. </p> <pre><code>var map = function(){ emit(this.doc_id, 1); }; var reduce = function(key, values){ var result = 0; values.forEach(function(value) { result += value; }); return result; }; db.test_collection.mapReduce(map, reduce, "test_duplicate_collection",null ); </code></pre> <p>Now, I understand that during balancing it can happen that some chunks are not deleted yet while transferring them to another shard. But I see in the status (<code>sh.status()</code>) that all chunks are equally distributed. I have also tried to pause write operations to see if it takes some time, but nothing happened. </p> <p>You might say deletion of moved chunks is still going on, and indeed when I just started to use sharding I saw slight decreases (with no write operations) for sharded collection. But currently, there is no change over time, it just stands still. I tried also to use <code>orphanage.js</code> with the hope to find orphaned documents (using the script from <a href="https://groups.google.com/forum/#!topic/mongodb-user/OKH5_KDO04I" rel="nofollow">https://groups.google.com/forum/#!topic/mongodb-user/OKH5_KDO04I</a>) but no such documents have been found.</p> <p>My question is what can be the reason that <code>count()</code> and <code>find().forEach()</code> give more than real number of documents (i.e. vs M/R script). </p> <p>Appreciate your help.</p> <p><strong>EDIT1</strong></p> <p>There was a problem with the configuration of the replication set in one of the shards. Specifically, no master has been set in the configuration file. In MMS dashboard instead of <code>Primary</code> I always saw <code>Slave</code> for host who was listened by other replication hosts. When we fixed it, <code>forEach</code> loop count started to show the same number of documents as in M/R script above. So the only problem currently is with the <code>count()</code> itself. </p> <p>In MongoDB JIRA I found the following unresolved bug with count() in sharded environment <a href="https://jira.mongodb.org/browse/SERVER-3645" rel="nofollow">https://jira.mongodb.org/browse/SERVER-3645</a> But it really relates to count() during balancing, i.e. count may count chunks which are currently moved by the balancer. As a workaround this bug proposes to put query which is always true. I tried it also but still it returns count as before.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload