Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimizing a simple query on two large tables
    primarykey
    data
    text
    <p>I'm trying to offer a feature where I can show pages most viewed by friends. My friends table has 5.7M rows and the views table has 5.3M rows. At the moment I just want to run a query on these two tables and find the 20 most viewed page id's by a person's friend.</p> <p>Here's the query as I have it now:</p> <pre><code>SELECT page_id FROM `views` INNER JOIN `friendships` ON friendships.receiver_id = views.user_id WHERE (`friendships`.`creator_id` = 143416) GROUP BY page_id ORDER BY count(views.user_id) desc LIMIT 20 </code></pre> <p>And here's how an explain looks:</p> <pre><code>+----+-------------+-------------+------+-----------------------------------------+---------------------------------+---------+-----------------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+-----------------------------------------+---------------------------------+---------+-----------------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | friendships | ref | PRIMARY,index_friendships_on_creator_id | index_friendships_on_creator_id | 4 | const | 271 | Using index; Using temporary; Using filesort | | 1 | SIMPLE | views | ref | PRIMARY | PRIMARY | 4 | friendships.receiver_id | 11 | Using index | +----+-------------+-------------+------+-----------------------------------------+---------------------------------+---------+-----------------------------------------+------+----------------------------------------------+ </code></pre> <p>The views table has a primary key of (user_id, page_id), and you can see this is being used. The friendships table has a primary key of (receiver_id, creator_id), and a secondary index of (creator_id). </p> <p>If I run this query without the group by and limit, there's about 25,000 rows for this particular user - which is typical.</p> <p>On the most recent real run, this query took 7 seconds too execute, which is way too long for a decent response in a web app.</p> <p>One thing I'm wondering is if I should adjust the secondary index to be (creator_id, receiver_id). I'm not sure that will give much of a performance gain though. I'll likely try it today depending on answers to this question.</p> <p>Can you see any way the query can be rewritten to make it lightening fast?</p> <p>Update: I need to do more testing on it, but it appears my nasty query works out better if I don't do the grouping and sorting in the db, but do it in ruby afterwards. The overall time is much shorter - by about 80% it seems. Perhaps my early testing was flawed - but this definitely warrants more investigation. If it's true - then wtf is Mysql doing?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload