StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>You've got a complex problem, which means you need to break it down into smaller, more easily solvable issues.</p> <p>Problems (as I see it):</p> <ol> <li>You've got an application which is collecting data. You just need to store that data somewhere locally until it gets sync'd to the server.</li> <li>You've received the data on the server and now you need to shove it into the database fast enough so that it doesn't slow down.</li> <li>You've got to report on that data and this sounds hard and complex.</li> </ol> <p>You probably want to write this as some sort of API, for simplicity (and since you've got loads of spare processing cycles on the clients) you'll want these chunks of data processed on the client side into JSON ready to import into the database. Once you've got JSON you don't need Mongoid (you just throw the JSON into the database directly). Also you probably don't need rails since you're just creating a simple API so stick with just Rack or Sinatra (possibly using something like <a href="https://github.com/intridea/grape" rel="nofollow" title="Grape">Grape</a>).</p> <p>Now you need to solve the whole "this all seems to block and is ultimately too slow" issue. We've already removed Mongoid (so no need to convert from JSON -> Ruby Objects -> JSON) and Rails. Before we get onto doing a MapReduce on this data you need to ensure it's getting loaded into the database quickly enough. Chances are you should architect the whole thing so that your MapReduce supports your reporting functionality. For sync'ing of data you shouldn't need to do anything but pass the JSON around. If your data isn't writing into your DB fast enough you should consider <a href="http://www.mongodb.org/display/DOCS/Sharding+Introduction" rel="nofollow">Sharding your dataset</a>. This will probably be done using some user-based key but you know your data schema better than I do. You need choose you sharding key so that when multiple users are sync'ing at the same time they will probably be using different servers.</p> <p>Once you've solved Problems 1 and 2 you need to work on your Reporting. This is probably supported by your MapReduce functions inside Mongo. My first comment on this part, is to make sure you're running at least Mongo 2.0. <a href="http://www.mongodb.org/display/DOCS/2.0+Release+Notes#2.0ReleaseNotes-Performanceimprovements" rel="nofollow">In that release 10gen sped up MapReduce</a> (my tests indicate that it is substantially faster than 1.8). Other than this you can can achieve further increases by Sharding and directing reads to the the Secondary servers in your Replica set (you are using a Replica set?). If this still isn't working consider structuring your schema to support your reporting functionality. This lets you use more cycles on your clients to do work rather than loading your servers. But this optimisation should be left until after you've proven that conventional approaches won't work.</p> <p>I hope that wall of text helps somewhat. Good luck!</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload