StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
10211855
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2012-04-18T14:43:47.497
FavoriteCount
0
LastActivityDate
2012-04-18T14:43:47.497
LastEditDate
LastEditorUserId
0
OwnerUserId
124745
ParentId
10209863
PostTypeId
2
Score
4
ViewCount
0
LastEditorDisplayName
text
Body
The simplest (and most scalable) solution is probably to translate the filtering conditions into a MongoDB query, and do the aggregation on the client side. Taking your example above, let's break it down and construct a MongoDB query (I'll show this using <a href="http://api.mongodb.org/python/current/" rel="nofollow">PyMongo</a>, but you could do the same using Mongoengine or another ODM if you prefer): <blockquote> WHERE col1=1 AND col2="foo" OR col3 > "2012-01-01 00:00:00" OR col3 < "2012-01-02 00:00:00" -- conditions </blockquote> This is the first argument to PyMongo's <code>find()</code> method. We have to explicitly build the logical AND/OR tree using the <code>$or</code> operator: <pre><code>from bson.tz_util import utc cursor = db.collection.find({'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}) </code></pre> Note that MongoDB does not convert strings to dates when comparing against date/time fields, so I've explicitly done so here using the Python <a href="http://docs.python.org/library/datetime.html" rel="nofollow"><code>datetime</code></a> module. The <a href="http://docs.python.org/library/datetime.html#datetime-objects" rel="nofollow"><code>datetime</code></a> class in that module assumes 0 as a default value for non-specified arguments. <blockquote> SELECT col1, col2 -- result columns </blockquote> We can use <a href="http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields" rel="nofollow">field selection</a> to only retrieve the fields that we want: <pre><code>from bson.tz_util import utc cursor = db.collection.find({'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}, fields=['col1', 'col2']) </code></pre> <blockquote> GROUP BY col4, col5 -- group by statement </blockquote> This can't be done efficiently using standard MongoDB queries (though I'll show in a moment how you might use the new <a href="http://www.mongodb.org/display/DOCS/Aggregation+Framework" rel="nofollow">Aggregation Framework</a> to do this all on the server side). Instead, knowing that we want to group by these columns, we can make the application code to do so simpler by sorting by these fields: <pre><code>from bson.tz_util import utc from pymongo import ASCENDING cursor = db.collection.find({'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}, fields=['col1', 'col2', 'col4', 'col5']) cursor.sort([('col4', ASCENDING), ('col5', ASCENDING)]) </code></pre> <blockquote> ORDER BY col1 DESC, col2 ASC -- order by statement </blockquote> This should be done in your application code after applying the aggregate functions you want (suppose we want to sum over col4, and take the max of col5): <pre><code>from bson.tz_util import utc from pymongo import ASCENDING cursor = db.collection.find({'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}, fields=['col1', 'col2', 'col4', 'col5']) cursor.sort([('col4', ASCENDING), ('col5', ASCENDING)]) # groupby REQUIRES that the iterable be sorted to work # correctly; we've asked Mongo to do this, so we don't # need to do so explicitly here. from itertools import groupby groups = groupby(cursor, keyfunc=lambda doc: (doc['col1'], doc['col2']) out = [] for (col1, col2), docs in groups: col4sum = 0 col5max = float('-inf') for doc in docs: col4sum += doc['col4'] col5max = max(col5max, doc['col5']) out.append({ 'col1': col1, 'col2': col2, 'col4sum': col4sum, 'col5max': col5max }) </code></pre> <h1>Using the Aggregation Framework</h1> If you are using MongoDB 2.1 or later (2.1.x is the development series leading up to the 2.2.0 stable release expected soon), you can use the Aggregation Framework to do all of this on the server side. To do so, use the <code>aggregate</code> command: <pre><code>from bson.son import SON from pymongo import ASCENDING, DESCENDING group_key = SON([('col4', '$col4'), ('col5': '$col5')]) sort_key = SON([('$col1', DESCENDING), ('$col2', ASCENDING)]) db.command('aggregate', 'collection_name', pipeline=[ # this is like the WHERE clause {'$match': {'$or': [ {'col1': 1, 'col2': 'foo'}, {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}}, {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}}, ]}}, # SELECT sum(col4), max(col5) ... GROUP BY col4, col5 {'$group': { '_id': group_key, 'col4sum': {'$sum': '$col4'}, 'col5max': {'$max': '$col5'}}}, # ORDER BY col1 DESC, col2 ASC {'$sort': sort_key} ]) </code></pre> The <code>aggregate</code> command returns a BSON document (i.e. a Python dictionary), which is subject to the usual restrictions from MongoDB: it will fail if the document to be returned is greater than 16MB in size. Additionally, for in-memory sorts (as are required by the <code>$sort</code> at the end of this aggregation), the Aggregation Framework will fail if the sort requires more than 10% of the physical RAM on the server (this is to prevent costly aggregations from evicting all of the memory used by Mongo for data files).
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POBuild mongoDB queries based on JSON from a user using Python
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USdcrosta
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POBuild mongoDB queries based on JSON from a user using Python
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COAmaizing answer!
 singulars
 PostPostId
 PO
 UserUserId
 USKennyPowers
2. COThank you very much! it what the best answer ever made for me :)
 singulars
 PostPostId
 PO
 UserUserId
 USKennyPowers

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.