Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>MapReduce would run JavaScript in a separate thread and use the code you provide to emit and reduce parts of your document to aggregate on certain fields. You can certainly look at the exercise as aggregating over each "fieldValue". Aggregation framework can do this as well but would be much faster as the aggregation would run on the server in C++ rather than in a separate JavaScript thread. But aggregation framework may return more data back than 16MB in which case you would need to do more complex partitioning of the data set.</p> <p>But it seems like the problem is a lot simpler than this. You just want to find for each profile what other profiles share particular attributes with it - without knowing the size of your dataset, and your performance requirements, I'm going to assume that you have an index on fieldValues so it would be efficient to query on it and then you can get the results you want with this simple loop:</p> <pre><code>&gt; db.profiles.find().forEach( function(p) { print("Matching profiles for "+tojson(p)); printjson( db.profiles.find( {"fieldValues": {"$in" : p.fieldValues}, "_id" : {$gt:p._id}} ).toArray() ); } ); </code></pre> <p>Output:</p> <pre><code>Matching profiles for { "_id" : 1, "firstName" : "John", "lastName" : "Smith", "fieldValues" : [ "favouriteColour|red", "food|pizza", "food|chinese" ] } [ { "_id" : 2, "firstName" : "Sarah", "lastName" : "Jane", "fieldValues" : [ "favouriteColour|blue", "food|pizza", "food|mexican", "pets|yes" ] }, { "_id" : 3, "firstName" : "Rachel", "lastName" : "Jones", "fieldValues" : [ "food|pizza" ] } ] Matching profiles for { "_id" : 2, "firstName" : "Sarah", "lastName" : "Jane", "fieldValues" : [ "favouriteColour|blue", "food|pizza", "food|mexican", "pets|yes" ] } [ { "_id" : 3, "firstName" : "Rachel", "lastName" : "Jones", "fieldValues" : [ "food|pizza" ] } ] Matching profiles for { "_id" : 3, "firstName" : "Rachel", "lastName" : "Jones", "fieldValues" : [ "food|pizza" ] } [ ] </code></pre> <p>Obviously you can tweak the query to not exclude already matched up profiles (by changing <code>{$gt:p._id}</code> to <code>{$ne:{p._id}}</code> and other tweaks. But I'm not sure what additional value you would get from using aggregation framework or mapreduce as this is not really aggregating a single collection on one of its fields (judging by the format of the output that you show). If your output format requirements are flexible, certainly it's possible that you could use one of the built in aggregation options as well.</p> <p>I did check to see what this would look like if aggregating around individual fieldValues and it's not bad, it might help you if your output can match this:</p> <pre><code>&gt; db.profiles.aggregate({$unwind:"$fieldValues"}, {$group:{_id:"$fieldValues", matchedProfiles : {$push: { id:"$_id", name:{$concat:["$firstName"," ", "$lastName"]}}}, num:{$sum:1} }}, {$match:{num:{$gt:1}}}); { "result" : [ { "_id" : "food|pizza", "matchedProfiles" : [ { "id" : 1, "name" : "John Smith" }, { "id" : 2, "name" : "Sarah Jane" }, { "id" : 3, "name" : "Rachel Jones" } ], "num" : 3 } ], "ok" : 1 } </code></pre> <p>This basically says "For each fieldValue ($unwind) group by fieldValue an array of matching profile _ids and names, counting how many matches each fieldValue accumulates ($group) and then exclude the ones that only have one profile matching it.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload