Note that there are some explanatory texts on larger screens.

plurals
  1. POMongoDB select count(distinct x) on an indexed column - count unique results for large data sets
    primarykey
    data
    text
    <p>I have gone through several articles and examples, and have yet to find an efficient way to do this SQL query in MongoDB (where there are millions of <del>rows</del> documents)</p> <p><strong>First attempt</strong> </p> <p>(e.g. from this almost duplicate question - <a href="https://stackoverflow.com/questions/5236160/mongo-equivalent-of-sqls-select-distinct">Mongo equivalent of SQL&#39;s SELECT DISTINCT?</a>)</p> <pre><code>db.myCollection.distinct("myIndexedNonUniqueField").length </code></pre> <p>Obviously I got this error as my dataset is huge</p> <pre><code>Thu Aug 02 12:55:24 uncaught exception: distinct failed: { "errmsg" : "exception: distinct too big, 16mb cap", "code" : 10044, "ok" : 0 } </code></pre> <p><strong>Second attempt</strong></p> <p>I decided to try and do a group </p> <pre><code>db.myCollection.group({key: {myIndexedNonUniqueField: 1}, initial: {count: 0}, reduce: function (obj, prev) { prev.count++;} } ); </code></pre> <p>But I got this error message instead:</p> <pre><code>exception: group() can't handle more than 20000 unique keys </code></pre> <p><strong>Third attempt</strong> </p> <p>I haven't tried yet but there are several suggestions that involve <code>mapReduce</code></p> <p>e.g. </p> <ul> <li>this one <a href="https://stackoverflow.com/questions/6222811/how-to-do-distinct-and-group-in-mongodb">how to do distinct and group in mongodb?</a> (not accepted, answer author / OP didn't test it)</li> <li>this one <a href="https://stackoverflow.com/questions/8769323/mongodb-group-by-functionalities">MongoDB group by Functionalities</a> (seems similar to Second Attempt)</li> <li>this one <a href="http://blog.emmettshear.com/post/2010/02/12/Counting-Uniques-With-MongoDB" rel="noreferrer">http://blog.emmettshear.com/post/2010/02/12/Counting-Uniques-With-MongoDB</a></li> <li>this one <a href="https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/trDn3jJjqtE" rel="noreferrer">https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/trDn3jJjqtE</a> </li> <li>this one <a href="http://cookbook.mongodb.org/patterns/unique_items_map_reduce/" rel="noreferrer">http://cookbook.mongodb.org/patterns/unique_items_map_reduce/</a></li> </ul> <p><strong>Also</strong></p> <p>It seems there is a pull request on GitHub fixing the <code>.distinct</code> method to mention it should only return a count, but it's still open: <a href="https://github.com/mongodb/mongo/pull/34" rel="noreferrer">https://github.com/mongodb/mongo/pull/34</a></p> <p>But at this point I thought it's worth to ask here, what is the latest on the subject? Should I move to SQL or another NoSQL DB for distinct counts? or is there an efficient way?</p> <p><strong>Update:</strong> </p> <p>This comment on the MongoDB official docs is not encouraging, is this accurate?</p> <p><a href="http://www.mongodb.org/display/DOCS/Aggregation#comment-430445808" rel="noreferrer">http://www.mongodb.org/display/DOCS/Aggregation#comment-430445808</a></p> <p><strong>Update2:</strong></p> <p>Seems the new Aggregation Framework answers the above comment... (MongoDB 2.1/2.2 and above, development preview available, not for production) </p> <p><a href="http://docs.mongodb.org/manual/applications/aggregation/" rel="noreferrer">http://docs.mongodb.org/manual/applications/aggregation/</a></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload