StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Your query, and the strategies below will benefit from an index <code>ON log(device_id,when)</code>. That index can replace the index <code>ON log(device_id)</code>, since that index would be redundant.</p> <hr> <p>If you have a whole boatload of log entries for each a device, the JOIN in your query is going to generate a good sized intermediate result set, which will get filtered down to one row per device. I don't believe the MySQL optimizer has any "shortcuts" for that anti-join operation (at least not in 5.1)... but your query might be the most efficient.</p> <p><b>Q:</b> Can I get the job done with a different strategy?</p> <p>Yes, there are other strategies that, but I don't know that any of these is any "better" than your query.</p> <hr> <p><b>UPDATE:</b></p> <p>One strategy you might consider is adding another table to your schema, one that holds the most recent log entry for each device. This could be maintained by TRIGGERs defined on the <code>log</code> table. If you are only performing inserts (no UPDATE and no DELETE of the most recent log entry, this is fairly straightforward. Whenever an insert is performed against the <code>log</code> table, an <code>AFTER INSERT FOR EACH ROW</code> trigger is fired, which compares the <code>when</code> value being inserted into the log table for a device_id, to the current <code>when</code> value in the <code>log_latest</code> table, and inserts/updates the row in the <code>log_latest</code> table so that the most recent row is always there. You could also (redundantly) store the device name in the table. (Alternatively, you could add a <code>latest_when</code> and a <code>latest_message</code> columns to the device table, and maintain them there.)</p> <p>But this strategy goes beyond your original question... but it is a workable strategy to consider if you need to frequently run a "latest log message for all devices" query. The downside is that you have an extra table, and a performance hit when performing inserts to the <code>log</code> table. This table could be entirely refreshed using a query like your original one, or the alternatives below.</p> <hr> <p>One approach is a query that does a simple join of the <code>device</code> and <code>log</code> tables, gets the rows ordered by device and by descending <code>when</code>. Then use a memory variable to process the rows, to filter out all but the "latest" log entry. Note that this query returns an extra column. (This extra column can be removed by wrapping the whole query as an inline view, but you'll likely get better performance if you can live with an extra column being returned:</p> <pre><code>SELECT IF(s.id = @prev_device_id,0,1) AS latest_flag , @prev_device_id := s.id AS id , s.name , s.message FROM (SELECT d.id , d.name , l.message FROM device d LEFT JOIN log l ON l.device_id = d.id WHERE d.active = 1 ORDER BY d.id, l.when DESC ) s JOIN (SELECT @prev_device_id := NULL) i HAVING latest_flag = 1 </code></pre> <p>What the first expression in the SELECT list is doing is "marking" a row whenever the device id value on that row is DIFFERS from the device id on the PREVIOUS row. The HAVING clause filters out all of the rows that aren't marked with a 1. (You can omit the HAVING clause to see how that expression works.)</p> <p>(I didn't test this for syntax errors, if you get an error, let me know, and I will take a closer look. My desk checking says it's fine... but it's possible I missed a paren or comma,) </p> <p>(You can "get rid of" that extra column by wrapping that in another query</p> <pre><code>SELECT r.id,r.name,r.message FROM ( /* query from above */ ) r </code></pre> <p>(but again, this may impact performance, you'd likely get better performance if you can live with the extra column.) </p> <p>Of course, add an ORDER BY on the outermost query to guarantee that your resultset is ordered the way you need it.</p> <p>This approach would work fairly well for a whole bunch of devices, and only a couple of related rows in log. Otherwise, this is going to generate a huge mess of a intermediate result set (on the order of the number of rows in the log table) which is going to have to be spun out to a temporary MyISAM table.</p> <p><b> UPDATE: </b></p> <p>If you are getting essentially all of the rows from <code>device</code> (where the predicate is not very selective), you can probably get better performance by getting the latest log entry for every device_id in the <code>log</code> table, and postpone the join to the <code>device</code> table. (But note that an index will not be available on that intermediate result set to do the join, so it would really need to be tested to gauge performance.)</p> <pre><code>SELECT d.id , d.name , t.message FROM device d LEFT JOIN (SELECT IF(s.device_id = @prev_device_id,0,1) AS latest_flag , @prev_device_id := s.device_id AS device_id , s.messsage FROM (SELECT l.device_id , l.message FROM log l ORDER BY l.device_id DESC, l.when DESC ) s JOIN (SELECT @prev_device_id := NULL) i HAVING latest_flag = 1 ) t ON t.device_id = d.id </code></pre> <p><b>NOTE:</b> We specify descending order on both the <code>device_id</code> and <code>when</code> columns in the ORDER BY clause of the inline view aliased as <code>s</code>, not because we need the rows in descending device_id order, but to allow to avoid a filesort operation by allowing MySQL to perform a "reverse scan" operation on an index with leading columns (device_id, when).</p> <p><b>NOTE:</b> This query still going to spool off intermediate result set as a temporary MyISAM tables, and there won't be any index on those. So its likely this won't perform as well as your original query.</p> <hr> <p>Another strategy is to use a correlated subquery in the SELECT list. You are only returning a single column from the log table, so this is fairly easy query to understand:</p> <pre><code>SELECT d.id , d.name , ( SELECT l.message FROM log l WHERE l.device_id = d.id ORDER BY l.when DESC LIMIT 1 ) AS message FROM device d WHERE d.active = 1 ORDER BY d.id ASC; </code></pre> <p><b>NOTE:</b> Since <code>id</code> is the PRIMARY KEY (or a UNIQUE KEY) in the <code>device</code> table, and because you aren't doing any JOIN that will generate extra rows, you can omit the <code>GROUP BY</code> clause.</p> <p><b>NOTE:</b> This query will use a "nested loops" operation. That is, for each row returned from the <code>device</code> table, (essentially) a separate query needs to be run to get the related row from log. For only a few <code>device</code> rows (as would be returned with a more selective predicate on the <code>device</code> table), and with a boatload of log entries for each device, performance will not be too bad. But for a lot of devices which each only have a few log messages, other approaches are very likely going to be much more efficient.)</p> <p>Also note, with this approach, note that you can easily extend it to also return the second latest log message as a separate column, by adding another subquery (just like that first one) to the SELECT list, just changing the LIMIT clause to skip the first row, and get the second row instead.</p> <pre><code> , ( SELECT l.message FROM log l WHERE l.device_id = d.id ORDER BY l.when DESC LIMIT 1,1 ) AS message_2 </code></pre> <hr> <p>For getting basically all the rows from device, you'll likely get the best performance using JOIN operations. The one drawback of this approach is that it has the potential to return multiple rows for a device, when there are two (or more) rows that have a matching latest <code>when</code> value for a device. (Basically, this approach is guaranteed to return a "correct" resultset when we have a guarantee that <code>log(device_id,when)</code> is unique.</p> <p>With this query as an inline view, to get the "latest" when value:</p> <pre><code>SELECT l.device_id , MAX(l.when) FROM log l GROUP BY l.device_id </code></pre> <p>We can join this to the log and device tables.</p> <pre><code>SELECT d.id , d.name , m.messsage FROM device d LEFT JOIN ( SELECT l.device_id , MAX(l.when) AS `when` FROM log l GROUP BY l.device_id ) k ON k.device_id = d.id LEFT JOIN log m ON m.device_id = d.id AND m.device_id = k.device_id AND m.when = k.when ORDER BY d.id </code></pre> <hr> <p>All of these are alternate strategies (which I believe is the question you asked), but I'm not sure either of those is going to be better for your particular needs. (But it's always good to have a couple of different tools in the tool belt, to use as appropriate.)</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload