Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<h2>Note: The solution is the last query at the bottom of this answer.</h2> <p>Test Schema and Data</p> <pre><code>create table adverts ( advert_id int primary key, title varchar(20), url varchar(20), user_id int, advert_type varchar(10)) ; create table advert_categories ( advert_id int, category_id int, primary key(category_id, advert_id)) ; create table website_categories ( website_id int, category_id int, primary key(website_id, category_id)) ; insert website_categories values (8,1),(8,3),(8,5), (1,1),(2,3),(4,5) ; insert adverts (advert_id, title, user_id) values (1, 'StackExchange', 1), (2, 'StackOverflow', 1), (3, 'SuperUser', 1), (4, 'ServerFault', 1), (5, 'Programming', 1), (6, 'C#', 2), (7, 'Java', 2), (8, 'Python', 2), (9, 'Perl', 2), (10, 'Google', 3) ; update adverts set advert_type = 'text' ; insert advert_categories values (1,1),(1,3), (2,3),(2,4), (3,1),(3,2),(3,3),(3,4), (4,1), (5,4), (6,1),(6,4), (7,2), (8,1), (9,3), (10,3),(10,5) ; </code></pre> <p>Data properties</p> <ul> <li>each website can belong to multiple categories</li> <li>for simplicity, all adverts are of type 'text'</li> <li>each advert can belong to multiple categories. If a website has multiple categories that are matched multiple times in advert_categories for the same user_id, this causes the advert_id's to show twice when using a straight join between 3 tables in the next query.</li> </ul> <p>This query joins the 3 tables together (notice that ids 1, 3 and 10 each appear twice)</p> <pre><code>select * from website_categories wc inner join advert_categories ac on wc.category_id = ac.category_id inner join adverts a on a.advert_id = ac.advert_id and a.advert_type = 'text' where wc.website_id='8' order by a.advert_id </code></pre> <p>To make each website show only once, this is the core query to show all eligible ads, each only once</p> <pre><code> select * from adverts a where a.advert_type = 'text' and exists ( select * from website_categories wc inner join advert_categories ac on wc.category_id = ac.category_id where wc.website_id='8' and a.advert_id = ac.advert_id) </code></pre> <p>The next query retrieves all the advert_id's to be shown</p> <pre><code>select advert_id, user_id from ( select advert_id, user_id, @r := @r + 1 r from (select @r:=0) r cross join ( # core query -- vvv select a.advert_id, a.user_id from adverts a where a.advert_type = 'text' and exists ( select * from website_categories wc inner join advert_categories ac on wc.category_id = ac.category_id where wc.website_id='8' and a.advert_id = ac.advert_id) # core query -- ^^^ order by rand() ) EligibleAdsAndUserIDs ) RowNumbered group by user_id order by r limit 2 </code></pre> <p>There are 3 levels to this query</p> <ol> <li>aliased <code>EligibleAdsAndUserIDs</code>: core query, sorted randomly using <code>order by rand()</code></li> <li>aliased <code>RowNumbered</code>: row number added to core query, using MySQL side-effecting @variables</li> <li>the outermost query forces mysql to collect rows <code>as numbered</code> randomly in the inner queries, and <code>group by user_id</code> causes it to retain only the first row for each user_id. <code>limit 2</code> causes the query to stop as soon as two distinct user_id's have been encountered.</li> </ol> <p>This is the final query which takes the advert_id's from the previous query and joins it back to table <code>adverts</code> to retrieve the required columns.</p> <ol> <li>only once per user_id</li> <li>feature user's with more ads proportionally (statistically) to the number of eligible ads they have</li> </ol> <p><em><strong>Note:</strong> Point (2) works because the more ads you have, the more likely you will hit the top placings in the row numbering subquery</em></p> <pre><code>select a.advert_id, a.title, a.url, a.user_id from ( select advert_id from ( select advert_id, user_id, @r := @r + 1 r from (select @r:=0) r cross join ( # core query -- vvv select a.advert_id, a.user_id from adverts a where a.advert_type = 'text' and exists ( select * from website_categories wc inner join advert_categories ac on wc.category_id = ac.category_id where wc.website_id='8' and a.advert_id = ac.advert_id) # core query -- ^^^ order by rand() ) EligibleAdsAndUserIDs ) RowNumbered group by user_id order by r limit 2 ) Top2 inner join adverts a on a.advert_id = Top2.advert_id; </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload