Note that there are some explanatory texts on larger screens.

plurals
  1. POReport Generation Design Patterns in Rails?
    primarykey
    data
    text
    <p>I am building several reports in an application and have come across a few ways of building the reports and wanted to get your take on the best/common ways to build reports that are both scalable and as real-time as possible.</p> <p>First, some conditions/limits/goals:</p> <ol> <li>The report should be able to handle being real time (using node.js or ajax polling)</li> <li>The report should update in an optimized way <ul> <li>If the report is about page views, and you're getting thousands a second, it might not be best to update the report every page view, but maybe every 10 or 100.</li> <li>But it should still be close to real-time (so daily/hourly cron is not an acceptable alternative).</li> </ul></li> <li>The report shouldn't be recalculating things that it's already calculated. <ul> <li>If it's has counts, it increments a counter.</li> <li>If it has averages, maybe it can somehow update the average without grabbing all records it's averaging every second and recalculating (not sure how to do this yet).</li> <li>If it has counts/averages for a date range (<em>today</em>, <em>last_week</em>, <em>last_month</em>, etc.), and it's real-time, it shouldn't have to recalculate those averages every second/request, somehow only do the most minimal operation.</li> </ul></li> <li>If the report is about a record and the record's "lifecycle" is complete (say a <code>Project</code>, and the project lasted 6 months, had a bunch of activity, but now it's over), the report should be permanently saved so subsequent retrievals just pull a pre-computed document.</li> </ol> <p>The reports don't need to be searchable, so once the data is in a document, we're just displaying the document. The client gets basically a JSON tree representing all the stats, charts, etc. so it can be rendered however in Javascript.</p> <p>My question arises because I am trying to figure out a way to do <strong>real-time reporting on huge datasets</strong>.</p> <p>Say I am reporting about overall user signup and activity on a site. The site has 1 million users, and there are on average 1000 page views per second. There is a <code>User</code> model and a <code>PageView</code> model let's say, where <code>User has_many :page_views</code>. Say I have these stats:</p> <pre><code>report = { :users =&gt; { :counts =&gt; { :all =&gt; user_count, :active =&gt; active_user_count, :inactive =&gt; inactive_user_count }, :averages =&gt; { :daily =&gt; average_user_registrations_per_day, :weekly =&gt; average_user_registrations_per_week, :monthly =&gt; average_user_registrations_per_month, } }, :page_views =&gt; { :counts =&gt; { :all =&gt; user_page_view_count, :active =&gt; active_user_page_view_count, :inactive =&gt; inactive_user_page_view_count }, :averages =&gt; { :daily =&gt; average_user_page_view_registrations_per_day, :weekly =&gt; average_user_page_view_registrations_per_week, :monthly =&gt; average_user_page_view_registrations_per_month, } }, } </code></pre> <p>Things I have tried:</p> <h3>1. Where <code>User</code> and <code>PageView</code> are both ActiveRecord objects, so everything is via SQL.</h3> <p>I grab all of the users in chunks something like this:</p> <pre><code>class User &lt; ActiveRecord::Base class &lt;&lt; self def report result = {} User.find_in_batches(:include =&gt; :page_views) do |users| # some calculations # result[:users]... users.each do |user| # result[:users][:counts][:active]... # some more calculations end end result end end end </code></pre> <h3>2. Both records are <code>MongoMapper::Document</code> objects</h3> <p>Map-reduce is really slow to calculate on the spot, and I haven't yet spent the time to figure out how to make this work real-time-esque (checking out <a href="http://projects.nuttnet.net/hummingbird/" rel="noreferrer">hummingbird</a>). Basically I do the same thing: chunk the records, add the result to a hash, and that's it.</p> <h3>3. Each calculation is it's own SQL/NoSQL query</h3> <p>This is kind of the approach the Rails <a href="https://github.com/acatighera/statistics" rel="noreferrer">statistics gem</a> takes. The only thing I don't like about this is the amount of queries this could possibly make (haven't benchmarked whether making 30 queries per-request-per-report is better than chunking all the objects into memory and sorting in straight ruby)</p> <h2>Question</h2> <p>The question I guess is, what's the best way, from your experience, to do real-time reporting on large datasets? With chunking/sorting the records in-memory every request (what I'm doing now, which I can somewhat optimize using hourly-cron, but it's not real-time), the reports take about a second to generate (complex date formulas and such), sometimes longer.</p> <p>Besides traditional optimizations (better date implementation, sql/nosql best practices), where I can I find some practical and tried-and-true articles on building reports? I can build reports no problem, the issue is, how do you make it fast, real-time, optimized, and <em>right</em>? Haven't found anything really.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload