StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSub-query Optimization Talk with an example case
text
Body
copied!<p>I need advises and want to share my experience about Query Optimization. This week, I found myself stuck in an interesting dilemma. I'm a novice person in mySql (2 years theory, less than one practical)</p> <p><strong>Environment :</strong></p> <p>I have a table that contains articles with a column 'type', and another table article_version that contain a date where an article is added in the DB, and a third table that contains all the article types along with types label and stuffs...</p> <p>The 2 first tables are huge (800000+ fields and growing daily), the 3rd one is naturally small sized. The article tables have a lot of column, but we will only need 'ID' and 'type' in articles and 'dateAdded' in article_version to simplify things...</p> <p><strong>What I want to do :</strong></p> <p>A Query that, for a specified 'dateAdded', returns the number of articles for each types (there is ~ 50 types to scan). What was already in place is 50 separate count, one for each document types oO ( not efficient, long(~ 5sec in general), ).</p> <p>I wanted to do it all in one query and I came up with that :</p> <pre><code>SELECT type, (SELECT COUNT(DISTINCT articles.ID) FROM articles INNER JOIN article_version ON article_version.ARTI_ID = legi_arti.ID WHERE type = td.NEW_ID AND dateAdded = '2009-01-01 00:00:00') AS nbrArti FROM type_document td WHERE td.NEW_ID != '' GROUP BY td.NEW_ID; </code></pre> <p>The external select (type_document) allow me to get the 55 types of documents I need. The sub-Query is counting the articles for each type_document for the given date '2009-01-01'.</p> <p>A common result is like :</p> <pre> * type * nbrArti * ************************* * 123456 * 23 * * 789456 * 5 * * 16578 * 98 * * .... * .... * * .... * .... * ************************* </pre> <p>This query get the job done, but the join in the sub-query is making this extremely slow, The reason, if I'm right, is that a join is made by the server for each types, so 50+ times, this solution is even more slower than doing the 50 queries independently for each types, awesome :/</p> <p><strong>A Solution</strong></p> <p>I came up with a solution myself that drastically improve the performance with the same result, I just created a view corresponding to the subQuery, making the join on ids for each types... And Boom, it's f.a.s.t.</p> <p>I think, correct me if I'm wrong, that the reason is the server only runs the JOIN statement once.</p> <p>This solution is ~5 time faster than the solution that was already there, and ~20 times faster than my first attempt. Sweet</p> <p><strong>Questions / thoughts</strong></p> <ul> <li>With yet another view, I'll now need to check if I don't loose more than win when documents get inserted...</li> <li>Is there a way to improve the original Query, by getting the JOIN statement out of the sub-query? (And getting rid of the view)</li> <li>Any other tips/thoughts? (In Server Optimizing for example...)</li> </ul> <hr> <p>Apologies for my approximating English, it'is not my primary language.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload