Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The simplest (and most scalable) solution is probably to translate the filtering conditions into a MongoDB query, and do the aggregation on the client side.</p> <p>Taking your example above, let's break it down and construct a MongoDB query (I'll show this using <a href="http://api.mongodb.org/python/current/" rel="nofollow">PyMongo</a>, but you could do the same using Mongoengine or another ODM if you prefer):</p> <blockquote> <p>WHERE col1=1 AND col2="foo" OR col3 > "2012-01-01 00:00:00" OR col3 &lt; "2012-01-02 00:00:00" -- conditions</p> </blockquote> <p>This is the first argument to PyMongo's <code>find()</code> method. We have to explicitly build the logical AND/OR tree using the <code>$or</code> operator:</p> <pre><code>from bson.tz_util import utc cursor = db.collection.find({'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}) </code></pre> <p>Note that MongoDB does not convert strings to dates when comparing against date/time fields, so I've explicitly done so here using the Python <a href="http://docs.python.org/library/datetime.html" rel="nofollow"><code>datetime</code></a> module. The <a href="http://docs.python.org/library/datetime.html#datetime-objects" rel="nofollow"><code>datetime</code></a> class in that module assumes 0 as a default value for non-specified arguments.</p> <blockquote> <p>SELECT col1, col2 -- result columns</p> </blockquote> <p>We can use <a href="http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields" rel="nofollow">field selection</a> to only retrieve the fields that we want:</p> <pre><code>from bson.tz_util import utc cursor = db.collection.find({'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}, fields=['col1', 'col2']) </code></pre> <blockquote> <p>GROUP BY col4, col5 -- group by statement</p> </blockquote> <p>This can't be done efficiently using standard MongoDB queries (though I'll show in a moment how you might use the new <a href="http://www.mongodb.org/display/DOCS/Aggregation+Framework" rel="nofollow">Aggregation Framework</a> to do this all on the server side). Instead, knowing that we want to group by these columns, we can make the application code to do so simpler by sorting by these fields:</p> <pre><code>from bson.tz_util import utc from pymongo import ASCENDING cursor = db.collection.find({'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}, fields=['col1', 'col2', 'col4', 'col5']) cursor.sort([('col4', ASCENDING), ('col5', ASCENDING)]) </code></pre> <blockquote> <p>ORDER BY col1 DESC, col2 ASC -- order by statement</p> </blockquote> <p>This should be done in your application code after applying the aggregate functions you want (suppose we want to sum over col4, and take the max of col5):</p> <pre><code>from bson.tz_util import utc from pymongo import ASCENDING cursor = db.collection.find({'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}, fields=['col1', 'col2', 'col4', 'col5']) cursor.sort([('col4', ASCENDING), ('col5', ASCENDING)]) # groupby REQUIRES that the iterable be sorted to work # correctly; we've asked Mongo to do this, so we don't # need to do so explicitly here. from itertools import groupby groups = groupby(cursor, keyfunc=lambda doc: (doc['col1'], doc['col2']) out = [] for (col1, col2), docs in groups: col4sum = 0 col5max = float('-inf') for doc in docs: col4sum += doc['col4'] col5max = max(col5max, doc['col5']) out.append({ 'col1': col1, 'col2': col2, 'col4sum': col4sum, 'col5max': col5max }) </code></pre> <h1>Using the Aggregation Framework</h1> <p>If you are using MongoDB 2.1 or later (2.1.x is the development series leading up to the 2.2.0 stable release expected soon), you can use the Aggregation Framework to do all of this on the server side. To do so, use the <code>aggregate</code> command:</p> <pre><code>from bson.son import SON from pymongo import ASCENDING, DESCENDING group_key = SON([('col4', '$col4'), ('col5': '$col5')]) sort_key = SON([('$col1', DESCENDING), ('$col2', ASCENDING)]) db.command('aggregate', 'collection_name', pipeline=[ # this is like the WHERE clause {'$match': {'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}}, # SELECT sum(col4), max(col5) ... GROUP BY col4, col5 {'$group': { '_id': group_key, 'col4sum': {'$sum': '$col4'}, 'col5max': {'$max': '$col5'}}}, # ORDER BY col1 DESC, col2 ASC {'$sort': sort_key} ]) </code></pre> <p>The <code>aggregate</code> command returns a BSON document (i.e. a Python dictionary), which is subject to the usual restrictions from MongoDB: it will fail if the document to be returned is greater than 16MB in size. Additionally, for in-memory sorts (as are required by the <code>$sort</code> at the end of this aggregation), the Aggregation Framework will fail if the sort requires more than 10% of the physical RAM on the server (this is to prevent costly aggregations from evicting all of the memory used by Mongo for data files).</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload