Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Since you have not provided a sample document (object) format take this as a sample collection called <strong>'stories'</strong>.</p> <pre><code>{ "_id" : ObjectId("4eafd693627b738f69f8f1e3"), "body" : "There was a king", "author" : "tom" } { "_id" : ObjectId("4eafd69c627b738f69f8f1e4"), "body" : "There was a queen", "author" : "tom" } { "_id" : ObjectId("4eafd72c627b738f69f8f1e5"), "body" : "There was a queen", "author" : "tom" } { "_id" : ObjectId("4eafd74e627b738f69f8f1e6"), "body" : "There was a jack", "author" : "tom" } { "_id" : ObjectId("4eafd785627b738f69f8f1e7"), "body" : "There was a humpty and dumpty . Humtpy was tall . Dumpty was short .", "author" : "jane" } { "_id" : ObjectId("4eafd7cc627b738f69f8f1e8"), "body" : "There was a cat called Mini . Mini was clever cat . ", "author" : "jane" } </code></pre> <p>For the given dataset, you can use the following javascript code to get to your solution. The collection "<strong>authors_unigrams</strong>" contains the result. All the code is supposed to be run using mongo console (http://www.mongodb.org/display/DOCS/mongo+-+The+Interactive+Shell).</p> <p><strong>First</strong>, we need to mark of all the new documents that have come afresh into the <strong>'stories'</strong> collection. We do it using following command. It will add a new attribute called "mr_status" into each document and assign value "inprocess". Later, we will see that map-reduce operation will only take those documents in account which are having the value "inprocess" for the field "mr_status". This way, we can avoid reconsidering all the documents for map-reduce operation that have been already considered in any of the previous attempt, making the operation efficient as asked.</p> <pre><code>db.stories.update({mr_status:{$exists:false}},{$set:{mr_status:"inprocess"}},false,true); </code></pre> <p><strong>Second</strong>, we define both <strong>map()</strong> and <strong>reduce()</strong> function. </p> <pre><code>var map = function() { uniqueWords = function (words){ var arrWords = words.split(" "); var arrNewWords = []; var seenWords = {}; for(var i=0;i&lt;arrWords.length;i++) { if (!seenWords[arrWords[i]]) { seenWords[arrWords[i]]=true; arrNewWords.push(arrWords[i]); } } return arrNewWords; } var unigrams = uniqueWords(this.body) ; emit(this.author, {unigrams:unigrams}); }; var reduce = function(key,values){ Array.prototype.uniqueMerge = function( a ) { for ( var nonDuplicates = [], i = 0, l = a.length; i&lt;l; ++i ) { if ( this.indexOf( a[i] ) === -1 ) { nonDuplicates.push( a[i] ); } } return this.concat( nonDuplicates ) }; unigrams = []; values.forEach(function(i){ unigrams = unigrams.uniqueMerge(i.unigrams); }); return { unigrams:unigrams}; }; </code></pre> <p><strong>Third</strong>, we actually run the map-reduce function.</p> <pre><code>var result = db.stories.mapReduce( map, reduce, {query:{author:{$exists:true},mr_status:"inprocess"}, out: {reduce:"authors_unigrams"} }); </code></pre> <p><strong>Fourth</strong>, we mark all the records that have been considered for map-reduce in last run as processed by setting "mr_status" as "processed".</p> <pre><code>db.stories.update({mr_status:"inprocess"},{$set:{mr_status:"processed"}},false,true); </code></pre> <p><strong>Optionally</strong>, you can see the result collection <strong>"authors_unigrams"</strong> by firing following command.</p> <pre><code>db.authors_unigrams.find(); </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload