StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POMySQL 5.5 "select distinct" is really slow
primarykey
Id
5720811
data
AcceptedAnswerId
0
AnswerCount
6
ClosedDate
CommentCount
5
CommunityOwnedDate
CreationDate
2011-04-19T18:18:37.037
FavoriteCount
2
LastActivityDate
2012-01-11T11:21:56.983
LastEditDate
2011-04-19T22:37:16.697
LastEditorUserId
631447
OwnerUserId
631447
ParentId
0
PostTypeId
1
Score
5
ViewCount
3002
LastEditorDisplayName
text
Body
One of the things my app does a fair amount is: <pre><code>select count(distinct id) from x; </code></pre> with <code>id</code> the primary key for table <code>x</code>. With MySQL 5.1 (and 5.0), it looks like this: <pre><code>mysql> explain SELECT count(distinct id) from x; +----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+ | 1 | SIMPLE | x | index | NULL | ix_blahblahblah | 1 | NULL | 1234567 | Using index | +----+-------------+----------+-------+---------------+-----------------+---------+------+---------+-------------+ </code></pre> On InnoDB, this isn't exactly blazing, but it's not bad, either. This week I'm trying out MySQL 5.5.11, and was surprised to see that the same query is many times slower. With the cache primed, it takes around 90 seconds, compared to 5 seconds before. The plan now looks like this: <pre><code>mysql> explain select count(distinct id) from x; +----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+ | 1 | SIMPLE | x | range | NULL | PRIMARY | 4 | NULL | 1234567 | Using index for group-by (scanning) | +----+-------------+----------+-------+---------------+---------+---------+------+---------+-------------------------------------+ </code></pre> One way to make it go fast again is to use <code>select count(id) from x</code>, which is safe because <code>id</code> is a primary key, but I'm going through some abstraction layers (like NHibernate) that make this a non-trivial task. I tried <code>analyze table x</code> but it didn't make any appreciable difference. It looks kind of like <a href="http://bugs.mysql.com/bug.php?id=49111" rel="nofollow">this bug</a>, though it's not clear what versions that applies to, or what's happening (nobody's touched it in a year yet it's "serious/high/high"). Is there any way, besides simply changing my query, to get MySQL to be smarter about this? UPDATE: As requested, here's a way to reproduce it, more or less. I wrote this SQL script to generate 1 million rows of dummy data (takes 10 or 15 minutes to run): <pre><code>delimiter $$ drop table if exists x; create table x ( id integer unsigned not null auto_increment, a integer, b varchar(100), c decimal(9,2), primary key (id), index ix_a (a), index ix_b (b), index ix_c (c) ) engine=innodb; drop procedure if exists fill; create procedure fill() begin declare i int default 0; while i < 1000000 do insert into x (a,b,c) values (1,"one",1.0); set i = i+1; end while; end$$ delimiter ; call fill(); </code></pre> When it's done, I observe this behavior: <ul> <li>5.1.48 <ul> <li><code>select count(distinct id) from x</code> <ul> <li>EXPLAIN is: key: ix_a, Extra: Using index</li> <li>takes under 1.0 sec to run</li> </ul></li> <li><code>select count(id) from x</code> <ul> <li>EXPLAIN is: key: ix_a, Extra: Using index</li> <li>takes under 0.5 sec to run</li> </ul></li> </ul></li> <li>5.5.11 <ul> <li><code>select count(distinct id) from x</code> <ul> <li>EXPLAIN is: key: PRIMARY, Extra: Using index for group-by</li> <li>takes over 7.0 sec to run</li> </ul></li> <li><code>select count(id) from x</code> <ul> <li>EXPLAIN is: key: ix_a, Extra: Using index</li> <li>takes under 0.5 sec to run</li> </ul></li> </ul></li> </ul> EDIT: If I modify the query in 5.5 by saying <pre><code>select count(distinct id) from x force index (ix_a); </code></pre> it runs much faster. Indexes b and c also work (to varying degrees), and even forcing index <code>PRIMARY</code> helps.
Tags
<mysql><nhibernate><primary-key><innodb><distinct>
Title
MySQL 5.5 "select distinct" is really slow
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USKen
UserOwnerUserId
1. USKen
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POMySQL 5.5 "select distinct" is really slow
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POMySQL 5.5 "select distinct" is really slow
 UserUserId
 USJohan
 VoteTypeVoteTypeId
 VTFavorite
3. VO
 singulars
 PostPostId
 POMySQL 5.5 "select distinct" is really slow
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.