Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Let me get this straight.</p> <p>You have a table <code>posts</code>. You have a table <code>posts_types</code>. The two have a many to many join on <code>posts_types_assignment</code>. And you have some query like this that is slow:</p> <pre><code>SELECT count(*) FROM posts p JOIN posts_types_assigment pta1 ON p.id = pta1.post_id JOIN posts_types pt1 ON pt1.id = pta1.post_type_id AND pt1.type = 'language' AND pt1.name = 'English' JOIN posts_types_assigment pta2 ON p.id = pta2.post_id JOIN posts_types pt2 ON pt2.id = pta2.post_type_id AND pt2.type = 'tag' AND pt2.name = 'awesome' </code></pre> <p>And you would like to know why it is painfully slow.</p> <p>My first note is that PostgreSQL would have to do a lot less work if you had the identifiers in the <code>posts</code> table rather than in the joins. But that is a moot issue, the decision has been made.</p> <p>My more useful note is that I believe that PostgreSQL has a similar query optimizer to Oracle. In which case to limit the combinatorial explosion of possible query plans that it has to consider, it only considers plans that start with some table, and then repeatedly joins on one more data set at a time. However no such query plan will work here. You can start with <code>pt1</code>, get 1 record, then go to <code>pta1</code>, get a bunch of records, join <code>p</code>, wind up with the same number of records, then join <code>pta2</code>, and now you get a huge number of records, then join to <code>pt2</code>, get just a few records. Joining to <code>pta2</code> is the slow step, because the database has no idea which records you want, and therefore has to create a temporary result set for every combination of a post and a piece of metadata (type, language or tag) on it.</p> <p>If this is indeed your problem, then the <em>right</em> plan looks like this. Join <code>pt1</code> to <code>pta1</code>, put an index on it. Join <code>pt2</code> to <code>pta2</code>, then join to the result of the first query, then join to <code>p</code>. Then count. This means that we don't get huge result sets.</p> <p>If this the case, there is no way to tell the query optimizer that this once you want it to think up a new type of execution plan. But there is a way to force it.</p> <pre><code>CREATE TEMPORARY TABLE t1 AS SELECT pta* FROM posts_types pt JOIN posts_types_assignment pta ON pt.id = pta.post_type_id WHERE pt.type = 'language' AND pt.name = 'English'; CREATE INDEX idx1 ON t1 (post_id); CREATE TEMPORARY TABLE t2 AS SELECT pta* FROM posts_types pt JOIN posts_types_assignment pta ON pt.id = pta.post_type_id JOIN t1 ON t1.post_id = pta.post_id WHERE pt.type = 'language' AND pt.name = 'English'; SELECT COUNT(*) FROM posts p JOIN t1 ON p.id = t1.post_id; </code></pre> <p>Barring random typos, etc, this is likely to perform somewhat better. If it doesn't, double check the indexes on your tables.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload