StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I'm a MSSQL guy and havent used mysql but the concepts should be the same.</p> <p>Firstly can you remove the group and order by and comment out all tables except for the first one "user". Also comment out any columns of the removed tables. As I have below.</p> <pre><code>SELECT s.id AS s__id, s.dname AS s__dname, s.gender AS s__gender, s.height AS s__height, s.dob AS s__dob -- s3.id AS s3__id, -- s3.code AS s3__code, -- s3.name_geo AS s3__name_geo, -- s4.id AS s4__id, -- s4.name_geo AS s4__name_geo, -- s5.id AS s5__id, -- s6.id AS s6__id, -- s7.id AS s7__id, -- s8.id AS s8__id, -- s9.id AS s9__id FROM user s --LEFT JOIN -- country s3 ON s.country_id = s3.id LEFT JOIN -- city s4 ON s.city_id = s4.id LEFT JOIN -- block s5 ON ((s.id = s5.blocked_id AND s5.blocker_id = '1')) LEFT JOIN -- favorite s6 ON ((s.id = s6.favorited_id AND s6.favoriter_id = '1')) LEFT JOIN -- favorite s7 ON ((s.id = s7.favoriter_id AND s7.favorited_id = '1')) LEFT JOIN -- message s8 ON ((s.id = s8.sender_id AND s8.receiver_id = '1' AND s8.read_at IS NULL)) LEFT JOIN -- message s9 ON (((s.id = s9.sender_id AND s9.receiver_id = '1') OR (s.id = s9.receiver_id AND s9.sender_id = '1'))) WHERE (s.id IN ('22', '36', '53', '105', '152', '156', '169', '182', '186', '192', '201', '215', '252', '287', '288', '321', '330', '351', '366', '399')) </code></pre> <p>Run the query and record the time. Then add one table and its columns back in at a time and run it until you find which one causes it to slow significantly.</p> <pre><code>SELECT s.id AS s__id, s.dname AS s__dname, s.gender AS s__gender, s.height AS s__height, s.dob AS s__dob, s3.id AS s3__id, s3.code AS s3__code, s3.name_geo AS s3__name_geo -- s4.id AS s4__id, -- s4.name_geo AS s4__name_geo, -- s5.id AS s5__id, -- s6.id AS s6__id, -- s7.id AS s7__id, -- s8.id AS s8__id, -- s9.id AS s9__id FROM user s LEFT JOIN country s3 ON s.country_id = s3.id --LEFT JOIN -- city s4 ON s.city_id = s4.id LEFT JOIN -- block s5 ON ((s.id = s5.blocked_id AND s5.blocker_id = '1')) LEFT JOIN -- favorite s6 ON ((s.id = s6.favorited_id AND s6.favoriter_id = '1')) LEFT JOIN -- favorite s7 ON ((s.id = s7.favoriter_id AND s7.favorited_id = '1')) LEFT JOIN -- message s8 ON ((s.id = s8.sender_id AND s8.receiver_id = '1' AND s8.read_at IS NULL)) LEFT JOIN -- message s9 ON (((s.id = s9.sender_id AND s9.receiver_id = '1') OR (s.id = s9.receiver_id AND s9.sender_id = '1'))) WHERE (s.id IN ('22', '36', '53', '105', '152', '156', '169', '182', '186', '192', '201', '215', '252', '287', '288', '321', '330', '351', '366', '399')) </code></pre> <p>My guess is that it would be the block and both favorites and message joins that is giving you the performance hit (the one with the most rows will be the biggest hit).</p> <p>For the block table, Can you remove one of the indexes and change the other to be something along the lines of (I am not sure of the syntax but you'll get the point)</p> <pre><code>INDEX blocker_id_idx (blocker_id,blocked_id), </code></pre> <p>and try it with the columns order swapped around to find witch order is best for your query</p> <pre><code>INDEX blocker_id_idx (blocked_id,blocker_id), </code></pre> <p>For the favorite table, change the indexes to</p> <pre><code>INDEX favoriter_id_idx (favoriter_id,favorited_id), INDEX favorited_id_idx (favorited_id,favoriter_id), </code></pre> <p>Again try it with the columns swapped around to find which give better performance. Do the same for the message indexes.</p> <p>Do that and let me know if things improved. There are a few other things that can be done to improve it further. - EDIT: It seams I lied about the few other things, what I had intended would not have made any difference. But I can speed up your first query which is below.</p> <h2>EDIT This is for your first select query.</h2> <p>This one is a bit long, but I wanted to show you how indexes work so you can make your own.</p> <p>Lets say the table contains 100,000 rows. </p> <p>When you select from it, this is the general process it will take. </p> <ul> <li>Are there any indexes that cover or mostly cover the columns that I need. (I your case, no there isn't.)</li> <li>So use Primary Index and scan though every row in the table to check for a match.</li> <li>Every row in the table will need to be read from disk to find which columns match you criteria. So to return the approx 10,000 rows (this is a guess) that match you data the database engine has read all 100,000 rows.</li> </ul> <p>You do have a top 20 in you query, so it will limit the amount of rows the engine will read from disk. Example </p> <ul> <li>read row 1: is match so add to result</li> <li>read row 2: no match - skip</li> <li>read row 3: no match - skip</li> <li>read row 4: is match so add to result.</li> <li>stop after 20 rows identified</li> </ul> <p>You potentially read about 5000 rows from disk to return 20.</p> <p>We need to create an index that will help us read as few records as possible from the table/disk, but still get the rows we are after. So here goes.</p> <p>Your query uses 4 filters to get to the data.</p> <pre><code>s.gender = 'female' AND s.marital_status = 'single' AND s.smokes = 'no' AND s.deleted_at IS NULL </code></pre> <p>What we need to do now is identify which filter by itself will return the least amount of rows. I cant tell as I don't have any data, but this is what I would guess to be in your table. </p> <p>The gender column support 2 values and it would be fair to estimate that half of the records in your database are male and the other female, so that filter you need will return approx 50,000 rows.</p> <p>Now for marital status, supports four values, so if we say the data has an equal spread, it would mean we would get roughly 25,000 rows back. Of course, it depends on th actual data and I would say, that there are not too many widowed in the data, so a better estimate may be 30% share between the other three. So lets say 30,000 records marked as single.</p> <p>Now for the smokes column. I have read that here in Australia about 10% of people smoke which is a fairly low number compared to other countries. So lets say 25% either smoke or smoke sometimes. That leaves us with approx 75,000 non smokers.</p> <p>Now for the last column, deleted. A fair guess on my part but lets say 5% are marked as deleted. That leaves us with approx 95,000 rows.</p> <p>So in summary (remember, this is all pure guess work on my part, your data may be different) Gender 50,000 rows or 50% Marital status 30,000 rows or 30% Smokes 75,000 rows or 75% Deleted 95,000 rows or 95%</p> <p>So if we create an index with the four columns using the one that returns the least amount of rows first, we would get the following</p> <pre><code>INDEX index01_idx (marital_status,gender,smokes,deleted_at), </code></pre> <p>Now this is what will happen when we run the select.</p> <ul> <li>The server will find an index that covers all the columns in the WHERE clause</li> <li>It will narrow down the result set to 30,000 "single" records.</li> <li>Of those 30,000, 50% will be female that leaves 15,000 records</li> <li>Of those 15,000, 75% will not smoke that leaves 11,250 records</li> <li>Of those 11,250, 95% will not be deleted,</li> </ul> <p>That leaves us with just over 10,000 records out of 100,000 total that we have identified as the records we want but not yet read from disk. You also have a limit 20 in the query so the database engine just needs to read the first 20 of the 10,000 and return the result. Its super quick, the hard disk will love you and the scary DBA will even mumble and grunt with approval. </p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload