Note that there are some explanatory texts on larger screens.

plurals
  1. POMySQL query to determine records that are related to category A and B
    primarykey
    data
    text
    <p>Here is a simplified version of my data:</p> <pre><code>products: +----+-----------+ | id | name | +----+-----------+ | 1 | Product X | | 2 | Product Y | | 3 | Product Z | +----+-----------+ categories: +----+---------------+ | id | name | +----+---------------+ | 1 | Hotel | | 2 | Accommodation | +----+---------------+ category_product +----+------------+-------------+ | id | product_id | category_id | +----+------------+-------------+ | 1 | 1 | 1 | | 2 | 1 | 2 | | 3 | 2 | 1 | | 4 | 3 | 2 | +----+------------+-------------+ </code></pre> <p>How do I construct an efficient query that will only retrieve <code>products</code> that have <strong>both</strong> categories "Hotel" and "Accommodation" related (eg. Product X)?</p> <p>I first tried a join approach</p> <pre class="lang-sql prettyprint-override"><code>SELECT * FROM products p JOIN category_product cp ON p.id = cp.product_id WHERE cp.category_id = 1 OR cp.category_id = 2 </code></pre> <p>^ This doesn't work because it doesn't contrain the query to <strong>containing both</strong>.</p> <p>I have found an approach using sub-queries that works... but I've been warned against sub-queries for performance reasons:</p> <pre class="lang-sql prettyprint-override"><code>SELECT * FROM products p WHERE ( SELECT id FROM category_product WHERE product_id = p.id AND category_id = 1 ) AND ( SELECT id FROM category_product WHERE product_id = p.id AND category_id = 2 ) </code></pre> <p>Are there any better solutions (or how about alternatives)? I've considered de-normalizing categories to an extra column on products but would ideally like to avoid that. Hoping for a magic bullet solution!</p> <p><strong>UPDATE</strong></p> <p>I've run some of the (great) solutions provided in the answers: My data is 235 000 category_product rows and 58 000 products and obviously benchmarks are always dependent on environment and indexes etc.</p> <p>"Relational division" @podiluska</p> <pre><code>2 categories: 2826 rows ~ 20ms 5 categories: 46 rows ~ 25-30 ms 8 categories: 1 rows ~ 25-30 ms </code></pre> <p>"Where exists" @Tim Schmelter</p> <pre><code>2 categories: 2826 rows ~ 5-7ms 5 categories: 46 rows ~ 30 ms 8 categories: 1 rows ~ 300 ms </code></pre> <p>One can see the results start to diverge with having a greater number of categories thrown in. I'll look at using "relational division" as it provides consistent results but implementation might cause me to look at "where exists" too (long format <a href="http://pastebin.com/6NRX0QbJ" rel="nofollow">http://pastebin.com/6NRX0QbJ</a>)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload