Note that there are some explanatory texts on larger screens.

plurals
  1. POWorking with large data sets and ruby
    primarykey
    data
    text
    <p>Could REALLY use some help here. Struggling with displaying a dashboard with large data.</p> <p>When working with @ 2k records average @ 2 sec. </p> <p>The query in MySql Console take less than 3.5 seconds to return 150k rows. Same query in Ruby takes over 4 + minutes from time query is performed until all objects and ready.</p> <p>Goal: Optimize data even further before adding cache server. Working with Ruby 1.9.2, Rails 3.0 and Mysql (Mysql2 gem)</p> <p>Questions:</p> <ul> <li>Does working with Hashes hurt performance? </li> <li>Should I first put everything in one primary hash then manipulate the data I need afterwards? </li> <li>Is there anything else I can do to help with performance?</li> </ul> <p>Rows in DB:</p> <ul> <li>GasStations and US Census has @ 150,000 records</li> <li>Person has @ 100,000 records</li> <li>Cars has @ 200,000 records </li> <li>FillUps has @ 2.3 Million </li> </ul> <p>Required for dashboard (query based on time periods of last 24 hours, last week, etc). All data returned in JSON format for JS.</p> <ul> <li>Gas Stations, with FillUps and US Census data (zip code, Name, City, Population)</li> <li>Top 20 cities with the most fill ups</li> <li>Top 10 cars with Fill Ups</li> <li>Cars grouped by how many times they filled up their tank</li> </ul> <p>Code (sample of 6 months. Returns @ 100k + records):</p> <pre><code># for simplicity, removed the select clause I had, but removing data I don't need like updated_at, gas_station.created_at, etc. instead of returning all the columns for each table. @primary_data = FillUp.includes([:car, :gas_staton, :gas_station =&gt; {:uscensus}]).where('fill_ups.created_at &gt;= ?', 6.months.ago) # This would take @ 4 + minutes # then tried @primary_data = FillUp.find_by_sql('some long sql query...') # took longer than before. # Note for others, sql query did some pre processing for me which added attributes to the return. Query in DB Console took &lt; 4 seconds. Because of these extra attributes, query took longer as if Ruby was checking each row for mapping attributes # then tried MY_MAP = Hash[ActiveRecord::Base.connection.select_all('SELECT thingone, thingtwo from table').map{|one| [one['thingone'], one['thingtwo']]}] as seen http://stackoverflow.com/questions/4456834/ruby-on-rails-storing-and-accessing-large-data-sets # that took 23 seconds and gained mapping of additional data that was processing later, so much faster # currently using below which takes @ 10 seconds # All though this is faster, query still only takes 3.5 seconds, but parsing it to the hashes does add overhead. cars = {} gasstations = {} cities = {} filled = {} client = Mysql2::Client.new(:host =&gt; "localhost", :username =&gt; "root") client.query("SELECT sum(fill_ups_grouped_by_car_id) as filled, fillups.car_id, cars.make as make, gasstations.name as name, ....", :stream =&gt; true, :as =&gt; :json).each do |row| # this returns fill ups gouged by car ,fill_ups.car_id, car make, gas station name, gas station zip, gas station city, city population if cities[row['city']] cities[row['city']]['fill_ups'] = (cities[row['city']]['fill_ups'] + row['filled']) else cities[row['city']] = {'fill_ups' =&gt; row['filled'], 'population' =&gt; row['population']} end if gasstations[row['name']] gasstations[row['name']]['fill_ups'] = (gasstations[row['name']]['fill_ups'] + row['filled']) else gasstations[row['name']] = {'city' =&gt; row['city'],'zip' =&gt; row['city'], 'fill_ups' =&gt; row['filled']} end if cars[row['make']] cars[row['make']] = (cars[row['make']] + row['filled']) else cars[row['make']] = row['filled'] end if row['filled'] filled[row['filled']] = (filled[row['filled']] + 1) else filled[row['filled']] = 1 end end </code></pre> <p>Have the following models: </p> <pre><code>def Person has_many :cars end def Car belongs_to :person belongs_to :uscensus, :foreign_key =&gt; :zipcode, :primary_key =&gt; :zipcode has_many :fill_ups has_many :gas_stations, :through =&gt; :fill_ups end def GasStation belongs_to :uscensus, :foreign_key =&gt; :zipcode, :primary_key =&gt; :zipcode has_many :fill_ups has_many :cars, :through =&gt; :fill_ups end def FillUp # log of every time a person fills up there gas belongs_to :car belongs_to :gas_station end def Uscensus # Basic data about area based on Zip code end </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload