Note that there are some explanatory texts on larger screens.

plurals
  1. POMySQL 5.5 "select distinct" is really slow
    primarykey
    data
    text
    <p>One of the things my app does a fair amount is:</p> <pre><code>select count(distinct id) from x; </code></pre> <p>with <code>id</code> the primary key for table <code>x</code>. With MySQL 5.1 (and 5.0), it looks like this:</p> <pre><code>mysql&gt; explain SELECT count(distinct id) from x; +----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+ | 1 | SIMPLE | x | index | NULL | ix_blahblahblah | 1 | NULL | 1234567 | Using index | +----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+ </code></pre> <p>On InnoDB, this isn't exactly blazing, but it's not bad, either.</p> <p>This week I'm trying out MySQL 5.5.11, and was surprised to see that the same query is many times slower. With the cache primed, it takes around 90 seconds, compared to 5 seconds before. The plan now looks like this:</p> <pre><code>mysql&gt; explain select count(distinct id) from x; +----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+ | 1 | SIMPLE | x | range | NULL | PRIMARY | 4 | NULL | 1234567 | Using index for group-by (scanning) | +----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+ </code></pre> <p>One way to make it go fast again is to use <code>select count(id) from x</code>, which is safe because <code>id</code> is a primary key, but I'm going through some abstraction layers (like NHibernate) that make this a non-trivial task.</p> <p>I tried <code>analyze table x</code> but it didn't make any appreciable difference.</p> <p>It looks kind of like <a href="http://bugs.mysql.com/bug.php?id=49111" rel="nofollow">this bug</a>, though it's not clear what versions that applies to, or what's happening (nobody's touched it in a year yet it's "serious/high/high").</p> <p>Is there any way, besides simply changing my query, to get MySQL to be smarter about this?</p> <p><strong>UPDATE:</strong></p> <p>As requested, here's a way to reproduce it, more or less. I wrote this SQL script to generate 1 million rows of dummy data (takes 10 or 15 minutes to run):</p> <pre><code>delimiter $$ drop table if exists x; create table x ( id integer unsigned not null auto_increment, a integer, b varchar(100), c decimal(9,2), primary key (id), index ix_a (a), index ix_b (b), index ix_c (c) ) engine=innodb; drop procedure if exists fill; create procedure fill() begin declare i int default 0; while i &lt; 1000000 do insert into x (a,b,c) values (1,"one",1.0); set i = i+1; end while; end$$ delimiter ; call fill(); </code></pre> <p>When it's done, I observe this behavior:</p> <ul> <li>5.1.48 <ul> <li><code>select count(distinct id) from x</code> <ul> <li>EXPLAIN is: key: ix_a, Extra: Using index</li> <li>takes under 1.0 sec to run</li> </ul></li> <li><code>select count(id) from x</code> <ul> <li>EXPLAIN is: key: ix_a, Extra: Using index</li> <li>takes under 0.5 sec to run</li> </ul></li> </ul></li> <li>5.5.11 <ul> <li><code>select count(distinct id) from x</code> <ul> <li>EXPLAIN is: key: PRIMARY, Extra: Using index for group-by</li> <li>takes over 7.0 sec to run</li> </ul></li> <li><code>select count(id) from x</code> <ul> <li>EXPLAIN is: key: ix_a, Extra: Using index</li> <li>takes under 0.5 sec to run</li> </ul></li> </ul></li> </ul> <p><strong>EDIT:</strong></p> <p>If I modify the query in 5.5 by saying</p> <pre><code>select count(distinct id) from x force index (ix_a); </code></pre> <p>it runs much faster. Indexes b and c also work (to varying degrees), and even forcing index <code>PRIMARY</code> helps.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload