Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h2>Our Solution</h2> <p>We do similar things on a large scale. If you're worried about scalability then keeping this code inside a Rails app going off of a basic postgres database is not your best choice. </p> <p>If you're going to be adding a bunch of metrics like this and if you're going to be counting words and diffs in the words by user, you should consider starting up a stream processing or batch processing platform. These solutions are not trivial, but worth it if you're going to need scale.</p> <p>Our solution uses twitter storm (<a href="http://storm-project.net" rel="nofollow">http://storm-project.net</a>) with the data counters in Mongo. In fact, their example is a word count application. Redis, as you've asked about isn't a bad choice, actually. I disagree with @jokklan because redis can implement counter storage with next-to-no effort.</p> <p>We do select the data out of a SQL database, so to start, postgres isn't a bad choice, but that will probably be the first thing you rip out when you start to really scale this thing.</p> <p>We also have forked storm deploy to help bring up storm servers more reliably. <a href="https://github.com/korrelate/storm-deploy" rel="nofollow">https://github.com/korrelate/storm-deploy</a></p> <h2>Other Options</h2> <p>Obviously, though, there are a bunch of different platforms to choose. </p> <ol> <li>You can use Hadoop MapReduce (<a href="http://hadoop.apache.org/docs/stable/mapred_tutorial.html" rel="nofollow">http://hadoop.apache.org/docs/stable/mapred_tutorial.html</a>)</li> <li>Pig which we use for other stuff through Mortar Data (<a href="http://www.mortardata.com" rel="nofollow">http://www.mortardata.com</a>)</li> <li>Amazon EMR which would allow you to do basic MapReduce or Pig jobs but this is more of an platform choice, not a framework and implementation choice</li> <li><p>Run some background jobs to compile this information using Sidekiq (<a href="https://github.com/mperham/sidekiq" rel="nofollow">https://github.com/mperham/sidekiq</a>) or Resque (not really recommended given sidekiq's advancements) or Iron Worker which runs as a service (<a href="http://www.iron.io/worker" rel="nofollow">http://www.iron.io/worker</a>)</p> <p>Here's a good article on some of the choices I've mentioned and probably some others (<a href="http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/" rel="nofollow">http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/</a>).</p></li> </ol> <h2>Recommendation</h2> <p>I can't honestly give you a good recommendation without more information about what sort of scale you're talking about. Given that, I might be able to help narrow down your choices a little better. How many users? Are you serious about giving all that granularity (that's fine if you are, just help determines scale)? Are there other things you'll want to do besides counting and diff'ing? </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload