Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Here is my first attempt. This one is dirty and uses the following properties on data:</p> <ul> <li>there are three 5 minute prices for each quarter in quantities (if this is violated in data the query will not work) </li> <li>notice for each and cardinality of three, this is not guaranteed by data integrity checks so therefore I call it dirty</li> <li>it is also not flexible to changes in periods</li> </ul> <p>Query 1:</p> <pre><code>SELECT sql_no_cache min(q.start_timestamp) as start, max(q.end_timestamp) as end, sum((p1.price + p2.price + p3.price)/3*q.quantity) as total FROM quantities q join prices p1 on q.start_timestamp = p1.timestamp and p1.type_id = 1 join prices p2 on p2.timestamp = adddate(q.start_timestamp, interval 5 minute) and p2.type_id = 1 join prices p3 on p3.timestamp = adddate(q.start_timestamp, interval 10 minute) and p3.type_id = 1 WHERE q.start_timestamp between '2010-07-01 00:00:00' and '2010-07-01 23:59:59' GROUP BY hour(q.start_timestamp); </code></pre> <p>This one returns results in 0.01 sec on my slow testing machine, where original query runs in ~6 sec, and gnarf's query in ~0.85 sec (all queries always tested with <code>SQL_NO_CACHE</code> keyword which does not reuse the results, but on a warm database).</p> <p>EDIT: Here is a version that is not sensitive to missing rows on the price side Query 1a</p> <pre><code>SELECT sql_no_cache min(q.start_timestamp) as start, max(q.end_timestamp) as end, sum( ( COALESCE(p1.price,0) + COALESCE(p2.price,0) + COALESCE(p3.price,0) ) / ( 3 - COALESCE(p1.price-p1.price,1) - COALESCE(p2.price-p2.price,1) - COALESCE(p3.price-p3.price,1) ) *q.quantity) as total FROM quantities q LEFT JOIN prices p1 on q.start_timestamp = p1.timestamp and p1.type_id = 1 LEFT JOIN prices p2 on p2.timestamp = adddate(q.start_timestamp, interval 5 minute) and p2.type_id = 1 LEFT JOIN prices p3 on p3.timestamp = adddate(q.start_timestamp, interval 10 minute) and p3.type_id = 1 WHERE q.start_timestamp between '2010-07-01 00:00:00' and '2010-07-01 23:59:59' GROUP BY hour(q.start_timestamp); </code></pre> <p>EDIT2: Query 2: Here is a direct improvement, and different approach, to your query with minimal changes that brings the execuction time to ~0.22 sec on my machine</p> <pre><code>SELECT sql_no_cache MIN( `quantities`.`start_timestamp` ) AS `start`, MAX( `quantities`.`end_timestamp` ) AS `end`, SUM( `quantities`.`quantity` * ( SELECT AVG( `prices`.`price` ) FROM `prices` WHERE `prices`.`timestamp` &gt;= '2010-07-01 00:00:00' AND `prices`.`timestamp` &lt; '2010-07-02 00:00:00' AND `prices`.`timestamp` &gt;= `quantities`.`start_timestamp` AND `prices`.`timestamp` &lt; `quantities`.`end_timestamp` AND `prices`.`type_id` = 1 ) ) AS total FROM `quantities` WHERE `quantities`.`start_timestamp` &gt;= '2010-07-01 00:00:00' AND `quantities`.`start_timestamp` &lt; '2010-07-02 00:00:00' GROUP BY HOUR( `quantities`.`start_timestamp` ); </code></pre> <p>That is mysql 5.1, I think I have read that in 5.5 this kind of thing (merging indexes) will be available to the query planner. Also, if you could make your start_timestamp and timestamp be related through foreign key that should allow these kind of correlated queries to make use of indexes (but for this you would need to modify design and establish some sort of timeline table which could then be referenced by quantities and prices both).</p> <p>Query 3: Finally, the last version which does it in ~0.03 sec, but should be as robust and flexible as Query 2</p> <pre><code>SELECT sql_no_cache MIN(start), MAX(end), SUM(subtotal) FROM ( SELECT sql_no_cache q.`start_timestamp` AS `start`, q.`end_timestamp` AS `end`, AVG(p.`price` * q.`quantity`) AS `subtotal` FROM `quantities` q LEFT JOIN `prices` p ON p.timestamp &gt;= q.start_timestamp AND p.timestamp &lt; q.end_timestamp AND p.timestamp &gt;= '2010-07-01 00:00:00' AND p.`timestamp` &lt; '2010-07-02 00:00:00' WHERE q.`start_timestamp` &gt;= '2010-07-01 00:00:00' AND q.`start_timestamp` &lt; '2010-07-02 00:00:00' AND p.type_id = 1 GROUP BY q.`start_timestamp` ) forced_tmp GROUP BY hour( start ); </code></pre> <p><strong>NOTE:</strong> Do not forget to remove <strong>sql_no_cache</strong> keywords in production.</p> <p>There are many counter intuitive tricks applied in the above queries (sometimes conditions repeated in the join condition speed up queries, sometimes they slow them down). Mysql is great little RDBMS and really fast when it comes to relatively simple queries, but when the complexity increases it is easy to run into the above scenarios. </p> <p>So in general, I apply the following principle to set my expectations regarding the performance of a query: </p> <ul> <li>if the base result set has &lt; 1,000 rows then query should do its business in ~0.01 sec (base result set is the number of rows that functionally determine resulting set)</li> </ul> <p>In this particular case you start with less then 1000 rows (all the prices and quantities in one day, with 15 minutes precision) and from that you should be able to compute the final results.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload