Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Although you can't do this real-time, you can run map-reduce multiple times to merge data together by using the "reduce" out option in MongoDB 1.8+ map/reduce (see <a href="http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Outputoptions" rel="noreferrer">http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Outputoptions</a>). You need to have some key in both collections that you can use as an _id.</p> <p>For example, let's say you have a <code>users</code> collection and a <code>comments</code> collection and you want to have a new collection that has some user demographic info for each comment.</p> <p>Let's say the <code>users</code> collection has the following fields:</p> <ul> <li>_id</li> <li>firstName</li> <li>lastName</li> <li>country</li> <li>gender</li> <li>age</li> </ul> <p>And then the <code>comments</code> collection has the following fields:</p> <ul> <li>_id</li> <li>userId</li> <li>comment</li> <li>created</li> </ul> <p>You would do this map/reduce:</p> <pre class="lang-js prettyprint-override"><code>var mapUsers, mapComments, reduce; db.users_comments.remove(); // setup sample data - wouldn't actually use this in production db.users.remove(); db.comments.remove(); db.users.save({firstName:"Rich",lastName:"S",gender:"M",country:"CA",age:"18"}); db.users.save({firstName:"Rob",lastName:"M",gender:"M",country:"US",age:"25"}); db.users.save({firstName:"Sarah",lastName:"T",gender:"F",country:"US",age:"13"}); var users = db.users.find(); db.comments.save({userId: users[0]._id, "comment": "Hey, what's up?", created: new ISODate()}); db.comments.save({userId: users[1]._id, "comment": "Not much", created: new ISODate()}); db.comments.save({userId: users[0]._id, "comment": "Cool", created: new ISODate()}); // end sample data setup mapUsers = function() { var values = { country: this.country, gender: this.gender, age: this.age }; emit(this._id, values); }; mapComments = function() { var values = { commentId: this._id, comment: this.comment, created: this.created }; emit(this.userId, values); }; reduce = function(k, values) { var result = {}, commentFields = { "commentId": '', "comment": '', "created": '' }; values.forEach(function(value) { var field; if ("comment" in value) { if (!("comments" in result)) { result.comments = []; } result.comments.push(value); } else if ("comments" in value) { if (!("comments" in result)) { result.comments = []; } result.comments.push.apply(result.comments, value.comments); } for (field in value) { if (value.hasOwnProperty(field) &amp;&amp; !(field in commentFields)) { result[field] = value[field]; } } }); return result; }; db.users.mapReduce(mapUsers, reduce, {"out": {"reduce": "users_comments"}}); db.comments.mapReduce(mapComments, reduce, {"out": {"reduce": "users_comments"}}); db.users_comments.find().pretty(); // see the resulting collection </code></pre> <p>At this point, you will have a new collection called <code>users_comments</code> that contains the merged data and you can now use that. These reduced collections all have <code>_id</code> which is the key you were emitting in your map functions and then all of the values are a sub-object inside the <code>value</code> key - the values aren't at the top level of these reduced documents.</p> <p>This is a somewhat simple example. You can repeat this with more collections as much as you want to keep building up the reduced collection. You could also do summaries and aggregations of data in the process. Likely you would define more than one reduce function as the logic for aggregating and preserving existing fields gets more complex.</p> <p>You'll also note that there is now one document for each user with all of that user's comments in an array. If we were merging data that has a one-to-one relationship rather than one-to-many, it would be flat and you could simply use a reduce function like this:</p> <pre class="lang-js prettyprint-override"><code>reduce = function(k, values) { var result = {}; values.forEach(function(value) { var field; for (field in value) { if (value.hasOwnProperty(field)) { result[field] = value[field]; } } }); return result; }; </code></pre> <p>If you want to flatten the <code>users_comments</code> collection so it's one document per comment, additionally run this:</p> <pre class="lang-js prettyprint-override"><code>var map, reduce; map = function() { var debug = function(value) { var field; for (field in value) { print(field + ": " + value[field]); } }; debug(this); var that = this; if ("comments" in this.value) { this.value.comments.forEach(function(value) { emit(value.commentId, { userId: that._id, country: that.value.country, age: that.value.age, comment: value.comment, created: value.created, }); }); } }; reduce = function(k, values) { var result = {}; values.forEach(function(value) { var field; for (field in value) { if (value.hasOwnProperty(field)) { result[field] = value[field]; } } }); return result; }; db.users_comments.mapReduce(map, reduce, {"out": "comments_with_demographics"}); </code></pre> <p>This technique should definitely not be performed on the fly. It's suited for a cron job or something like that which updates the merged data periodically. You'll probably want to run <code>ensureIndex</code> on the new collection to make sure queries you perform against it run quickly (keep in mind that your data is still inside a <code>value</code> key, so if you were to index <code>comments_with_demographics</code> on the comment <code>created</code> time, it would be <code>db.comments_with_demographics.ensureIndex({"value.created": 1});</code></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload