Note that there are some explanatory texts on larger screens.

plurals
  1. POMongoDB: Sharding on single machine. Does it make sense?
    text
    copied!<p>created a collection in MongoDB consisting of 11446615 documents.</p> <p>Each document has the following form:</p> <pre><code>{ "_id" : ObjectId("4e03dec7c3c365f574820835"), "httpReferer" : "http://www.somewebsite.pl/art.php?id=13321&amp;b=1", "words" : ["SEX", "DRUGS", "ROCKNROLL", "WHATEVER"], "howMany" : 3 } </code></pre> <p><strong>httpReferer</strong>: just an url</p> <p><strong>words</strong>: words parsed from the url above. Size of the list is between 15 and 90.</p> <p>I am planning to use this database to obtain list of webpages which have similar content.</p> <p>I 'll by querying this collection using words field so I created (or rather started creating) index on this field:</p> <pre><code>db.my_coll.ensureIndex({words: 1}) </code></pre> <p>Creating this collection takes very long time. I tried two approaches (tests below were done on my laptop):</p> <ol> <li><strong>Inserting and indexing</strong> Inserting took 5.5 hours mainly due to cpu intensive preprocessing of data. Indexing took 30 hours.</li> <li><strong>Indexing before inserting</strong> It would take a few days to insert all data to collection. </li> </ol> <p>My main focus it to decrease time of generating the collection. I don't need replication (at least for now). Querying also doesn't have to be light-fast.</p> <p>Now, time for a question:</p> <p>I have only one machine with one disk were I can run my app. Does it make sense to run more than one instance of the database and split my data between them? </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload