Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h1>Changing the table</h1> <hr> <p>Based on the advice in this post on <a href="http://mysqldba.blogspot.com/2008/06/how-to-pick-indexes-for-order-by-and.html" rel="nofollow">How to pick indexes for order by and group by queries</a> the table now looks like <pre><code>CREATE TABLE ClusterMatches ( cluster_index INT UNSIGNED, match_index INT UNSIGNED, id INT NOT NULL AUTO_INCREMENT, tfidf FLOAT, PRIMARY KEY (match_index,cluster_index,id,tfidf) ); CREATE TABLE MatchLookup ( match_index INT UNSIGNED NOT NULL PRIMARY KEY, image_match TINYTEXT ); </pre></code></p> <h1>Eliminating Subquery</h1> <p>The query without sorting the results by the SUM(tfidf) looks like <pre><code>SELECT match_index, SUM(tfidf) FROM ClusterMatches WHERE cluster_index in (1,2,3 ... 3000) GROUP BY match_index LIMIT 10;</pre></code></p> <p>Which eliminates using temporary and using filesort </p> <p><pre><code>explain extended SELECT match_index, SUM(tfidf) FROM ClusterMatches WHERE cluster_index in (1,2,3 ... 3000) GROUP BY match_index LIMIT 10; +----+-------------+----------------------+-------+---------------+---------+---------+------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------------+-------+---------------+---------+---------+------+-------+--------------------------+ | 1 | SIMPLE | ClusterMatches | range | PRIMARY | PRIMARY | 4 | NULL | 14938 | Using where; Using index | +----+-------------+----------------------+-------+---------------+---------+---------+------+-------+--------------------------+</pre></code></p> <h1>Sorting Problem</h1> <p>However if i add the ORDER BY SUM(tfdif) in <code><pre>SELECT match_index, SUM(tfidf) AS total FROM ClusterMatches WHERE cluster_index in (1,2,3 ... 3000) GROUP BY match_index ORDER BY total DESC LIMIT 0,10; +-------------+--------------------+ | match_index | total | +-------------+--------------------+ | 868 | 0.11126546561718 | | 4182 | 0.0238558370620012 | | 2162 | 0.0216601379215717 | | 1406 | 0.0191618576645851 | | 4239 | 0.0168981291353703 | | 1437 | 0.0160425212234259 | | 2599 | 0.0156466849148273 | | 394 | 0.0155945559963584 | | 3116 | 0.0151005545631051 | | 4028 | 0.0149106932803988 | +-------------+--------------------+ 10 rows in set (0.03 sec)</code></pre></p> <p>The result is suitably fast at this scale BUT having the <strong>ORDER BY SUM(tfidf) means it uses temporary and filesort</strong> <code><pre>explain extended SELECT match_index, SUM(tfidf) AS total FROM ClusterMatches WHERE cluster_index IN (1,2,3 ... 3000) GROUP BY match_index ORDER BY total DESC LIMIT 0,10; +----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+ | 1 | SIMPLE | ClusterMatches | range | PRIMARY | PRIMARY | 4 | NULL | 65369 | Using where; Using index; Using temporary; Using filesort | +----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+</code></pre></p> <h1>Possible Solutions?</h1> <p>Im looking for a solution that doesn't use temporary or filesort, along the lines of <code><pre>SELECT match_index, SUM(tfidf) AS total FROM ClusterMatches WHERE cluster_index IN (1,2,3 ... 3000) GROUP BY cluster_index, match_index HAVING total>0.01 ORDER BY cluster_index;</code></pre> where I dont need to hardcode a threshold for total, any ideas?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload