Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>There are two approaches to solve this in MongoDB. </p> <ol> <li><p>If your subsets are fairly small, you can just do a query on subset to find all members and use the result of that query as an initial <a href="http://docs.mongodb.org/manual/reference/command/mapReduce/" rel="nofollow">query</a> to a map-reduce call.</p></li> <li><p>However, if you have very large subsets, this may not be possible. What you can do then, is to <a href="http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/" rel="nofollow">simulate a join using two map-reduce calls</a> with the 'reduce' output option to reduce into the same target collection. This will create an intermediate collection where the documents look something like this:</p> <pre><code>{Name: Jim, Age: 24, inSubset: true} {Name: Bill, Age: 38, inSubset: false} {Name: Mary, Age: 55, inSubset: true} </code></pre> <p>Finally, you can execute a third map reduce on this intermediate collection to average over all the documents that have <code>inSubset: true</code>. </p></li> </ol> <p>Here is the code for the 2. option (the three map-reduces) in Python, using the pymongo driver:</p> <pre><code>from pymongo import Connection from bson import ObjectId, Code con = Connection(port=30000) # add host/port here if different from default db = con['test'] # or the database name you are using # insert documents db.master.insert({'_id': ObjectId(), 'Name': 'Jim', 'Age': 24}) db.master.insert({'_id': ObjectId(), 'Name': 'Bill', 'Age': 38}) db.master.insert({'_id': ObjectId(), 'Name': 'Mary', 'Age': 55}) db.subset.insert({'_id': ObjectId(), 'Name': 'Jim'}) db.subset.insert({'_id': ObjectId(), 'Name': 'Mary'}) # map function for master collection mapf_master = Code(""" function () { emit(this.Name, {'age': this.Age, 'inSubset': false}); } """) # map function for subset collection mapf_subset = Code(""" function() { emit(this.Name, {'age': 0, 'inSubset': true}); } """) # reduce function for both master and subset reducef = Code(""" function(key, values) { var result = {'age': 0, 'inSubset': false}; values.forEach( function(value) { result.age += value.age; result.inSubset = result.inSubset || value.inSubset; }); return result; } """) # call map-reduce on master and subset (simulates a join) db.master.map_reduce(mapf_master, reducef, out={'reduce': 'join'}) db.subset.map_reduce(mapf_subset, reducef, out={'reduce': 'join'}) # final map function for third map-reduce call mapf_final = Code(""" function() { if (this.value.inSubset) { emit('total', {'age': this.value.age, 'count': 1}); } } """) # final reduce function for third map-reduce call reducef_final = Code(""" function(key, values) { var result = {'age': 0, 'count': 0}; values.forEach( function(value) { result.age += value.age; result.count += value.count; }); return result; } """) # final finalize function, calculates the average finalizef_final = Code(""" function(key, value) { if (value.count &gt; 0) { value.averageAge = value.age / value.count; } return value; } """) # call final map-reduce db.join.map_reduce(mapf_final, reducef_final, finalize=finalizef_final, out={'merge': 'result'}) </code></pre> <p>The result collection looks like this (queried from the mongo shell):</p> <pre><code>&gt; db.result.find() { "_id" : "total", "value" : { "age" : 79, "count" : 2, "averageAge" : 39.5 } } </code></pre> <p>and the final average is stored in the value.averageAge field.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload