Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:</p> <pre><code>WITH ranked_messages AS ( SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn FROM messages AS m ) SELECT * FROM ranked_messages WHERE rn = 1; </code></pre> <p>Below is the original answer I wrote for this question in 2009:</p> <hr> <p>I write the solution this way:</p> <pre><code>SELECT m1.* FROM messages m1 LEFT JOIN messages m2 ON (m1.name = m2.name AND m1.id &lt; m2.id) WHERE m2.id IS NULL; </code></pre> <p>Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.</p> <p>For example, I have a copy of the <a href="https://archive.org/details/stackexchange" rel="noreferrer">StackOverflow August data dump</a>. I'll use that for benchmarking. There are 1,114,357 rows in the <code>Posts</code> table. This is running on <a href="https://www.mysql.com/" rel="noreferrer">MySQL</a> 5.0.75 on my Macbook Pro 2.40GHz.</p> <p>I'll write a query to find the most recent post for a given user ID (mine).</p> <p><strong>First using the technique <a href="https://stackoverflow.com/questions/1313120/sql-retrieving-the-last-record-in-each-group/1313140#1313140">shown</a> by @Eric with the <code>GROUP BY</code> in a subquery:</strong></p> <pre><code>SELECT p1.postid FROM Posts p1 INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid FROM Posts pi GROUP BY pi.owneruserid) p2 ON (p1.postid = p2.maxpostid) WHERE p1.owneruserid = 20860; 1 row in set (1 min 17.89 sec) </code></pre> <p>Even the <a href="https://dev.mysql.com/doc/refman/5.7/en/using-explain.html" rel="noreferrer"><code>EXPLAIN</code> analysis</a> takes over 16 seconds:</p> <pre><code>+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | 1 | PRIMARY | &lt;derived2&gt; | ALL | NULL | NULL | NULL | NULL | 76756 | | | 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where | | 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ 3 rows in set (16.09 sec) </code></pre> <p><strong>Now produce the same query result using <a href="https://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column/123481#123481">my technique</a> with <code>LEFT JOIN</code>:</strong></p> <pre><code>SELECT p1.postid FROM Posts p1 LEFT JOIN posts p2 ON (p1.owneruserid = p2.owneruserid AND p1.postid &lt; p2.postid) WHERE p2.postid IS NULL AND p1.owneruserid = 20860; 1 row in set (0.28 sec) </code></pre> <p>The <code>EXPLAIN</code> analysis shows that both tables are able to use their indexes:</p> <pre><code>+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index | | 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ 2 rows in set (0.00 sec) </code></pre> <hr> <p>Here's the DDL for my <code>Posts</code> table:</p> <pre><code>CREATE TABLE `posts` ( `PostId` bigint(20) unsigned NOT NULL auto_increment, `PostTypeId` bigint(20) unsigned NOT NULL, `AcceptedAnswerId` bigint(20) unsigned default NULL, `ParentId` bigint(20) unsigned default NULL, `CreationDate` datetime NOT NULL, `Score` int(11) NOT NULL default '0', `ViewCount` int(11) NOT NULL default '0', `Body` text NOT NULL, `OwnerUserId` bigint(20) unsigned NOT NULL, `OwnerDisplayName` varchar(40) default NULL, `LastEditorUserId` bigint(20) unsigned default NULL, `LastEditDate` datetime default NULL, `LastActivityDate` datetime default NULL, `Title` varchar(250) NOT NULL default '', `Tags` varchar(150) NOT NULL default '', `AnswerCount` int(11) NOT NULL default '0', `CommentCount` int(11) NOT NULL default '0', `FavoriteCount` int(11) NOT NULL default '0', `ClosedDate` datetime default NULL, PRIMARY KEY (`PostId`), UNIQUE KEY `PostId` (`PostId`), KEY `PostTypeId` (`PostTypeId`), KEY `AcceptedAnswerId` (`AcceptedAnswerId`), KEY `OwnerUserId` (`OwnerUserId`), KEY `LastEditorUserId` (`LastEditorUserId`), KEY `ParentId` (`ParentId`), CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`) ) ENGINE=InnoDB; </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload