Note that there are some explanatory texts on larger screens.

plurals
  1. POIndices and using primary keys as indices in MySQL
    primarykey
    data
    text
    <p>I have a table <code>Assets</code> on an InnoDB engine defined as:</p> <pre class="lang-sql prettyprint-override"><code>CREATE TABLE Assets ( qid SMALLINT(5) NOT NULL, sid BIGINT(20) NOT NULL AUTO_INCREMENT, ... PRIMARY KEY (sid,qid), KEY sid (sid) ); </code></pre> <p>I'm running the following query:</p> <pre class="lang-sql prettyprint-override"><code>SELECT COUNT(*) FROM Assets WHERE sid&gt;10000; </code></pre> <p>On my machine, this query takes about 30 seconds with 2 millions entries in the table. Now, if I modify the query to use indices, the results vary widely:</p> <pre class="lang-sql prettyprint-override"><code>SELECT COUNT(*) FROM Assets USE INDEX(&lt;index&gt;) WHERE sid&gt;10000; </code></pre> <ul> <li><code>NO INDEX</code> : No explicit <code>USE INDEX</code>, i.e., the first <code>SELECT</code> query : <strong>30 secs</strong></li> <li><code>KEY sid (sid)</code> : <strong>1.5 secs</strong></li> <li><code>KEY cid (sid,qid)</code> : <strong>1.5 secs</strong></li> <li><code>PRIMARY</code> : I used <code>USE INDEX(PRIMARY)</code> within the query. : <strong>30 secs</strong></li> </ul> <p>So these are my questions:</p> <ol> <li><p>I thought a query would automatically use the primary key as its index, based on <a href="https://stackoverflow.com/questions/7485578/is-there-any-benefit-to-creating-and-index-on-a-primary-key">this</a>. And yet there's a major difference between <code>USE INDEX (cid)</code> and <code>NO INDEX</code>. What's the difference? Also, how I do explicitly give the primary key as the index?</p></li> <li><p>If <code>NO INDEX</code> doesn't actually use the primary key as an index, what does <code>USE INDEX(PRIMARY)</code> do that causes it to have the same running time as <code>NO INDEX</code>?</p></li> <li><p>Is there a difference (not just performance wise) between <code>USE INDEX(sid)</code> and <code>USE INDEX(cid)</code> in a query that only filters by <code>sid</code>?</p></li> </ol> <p>Forgive the long post, but I wanted to let it be open to discussion.</p> <hr> <p>Ok, here's what I have found out so far:</p> <p>First of all, I'm told the key setup should be either: <code>PRIMARY KEY(qid,sid), KEY(sid)</code> or <code>PRIMARY KEY(sid,qid), KEY(qid)</code>. I don't really understand the difference. If someone does, please let me know.</p> <p>Secondly, the KEY <code>sid</code>(<code>sid</code>) references far fewer index pages than a larger key, so it tends to be faster. As for the difference between using the PRIMARY KEY as the index and a proper KEY (even if they use the same fields), I've been told it's something like this:</p> <p>Primary keys index the entire table data with the fields of the primary key. That means that the PRIMARY KEY and the data are stored together. So a query using the PRIMARY KEY would have to go through the entire table data, which even indexed would bog down on large uncacheable tables.</p> <p>With discrete keys, the number of rows may be the same, but a much smaller index (consisting of the indicated fields) is scanned, which hits a smaller number of disk blocks, and hence runs much faster. I'm assuming this is also the reason for the difference in using <code>USE INDEX(cid)</code> and using the primary key as index, both of which have the same fields.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload