StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POImprove performance of count and sum when already indexed
text
Body
copied!<p>First, here is the query I have:</p> <pre><code>SELECT COUNT(*) as velocity_count, SUM(`disbursements`.`amount`) as summation_amount FROM `disbursements` WHERE `disbursements`.`accumulation_hash` = '40ad7f250cf23919bd8cc4619850a40444c5e90c978f88635a09ccf66a82ffb38e39ea51cdfd651b0ebdac5f5ca37cd7a17e0f60fea6cbce1397ccff5fa37346' AND `disbursements`.`caller_id` = 1 AND `disbursements`.`active` = 1 AND (version_hash != '86b4111677294b27a1805643d193b8d437b6ddb170b4ed5dec39aa89bf070d160cbbcd697dfc1988efea8429b1f1557625bf956180c65d3dcd3a318280e0d2da') AND (`disbursements`.`created_at` BETWEEN '2012-12-15 23:33:22' AND '2013-01-14 23:33:22') LIMIT 1 </code></pre> <p>Explain extended returns the following:</p> <pre><code>+----+-------------+---------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------+---------+------+--------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------+---------+------+--------+----------+--------------------------+ | 1 | SIMPLE | disbursements | range | unique_request_index,index_disbursements_on_caller_id,disbursement_summation_index,disbursement_velocity_index,disbursement_version_out_index | disbursement_summation_index | 1543 | NULL | 191422 | 100.00 | Using where; Using index | +----+-------------+---------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------+---------+------+--------+----------+--------------------------+ </code></pre> <p>The actual query counts about 95,000 rows. If I explain another query that hits ~50 rows the explain is identical, just with fewer rows estimated.</p> <p>The index being chosen covers accumulation_hash, caller_id, active, version_hash, created_at, amount in that order.</p> <p>I've tried playing around with doing COUNT(id) or COUNT(caller_id) since these are non-null fields and return the same thing as count(*), but it doesn't have any impact on the plan or the run time of the actual query.</p> <p>This is also a heavy insert table, essentially every single query will have had a row inserted or updated since the last time it was run, so the mysql query cache isn't entirely useful. </p> <p>Before I go and make some sort of bucketed time sequence cache with something like memcache or redis, is there an obvious solution to getting this to work much faster? A normal ~50 row query returns in 5MS, the ones across 90k+ rows are taking 500-900MS and I really can't afford anything much past 100MS.</p> <p>I should point out the dates are a rolling 30 day window that needs to be essentially real time. Expiration could probably happen with ~1 minute granularity, but new items need to be seen immediately upon commit. I'm also on RDS, Read IOPS are essentially 0, and cpu is about 60-80%. When I'm not querying the giant 90,000+ record items, CPU typically stays below 10%.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload