Note that there are some explanatory texts on larger screens.

plurals
  1. POPostgreSQL Multi-Column index join with comparison ("<" and ">") operators
    text
    copied!<p>I'm trying to take advantage of a multi-column btree index in PostgreSQL to perform an annoying join between two tables. </p> <pre><code> Table "revision_main" Column | Type | Modifiers ----------------+------------------------+----------- revision_id | integer | page_id | integer | Indexes: "revision_main_pkey" UNIQUE, btree (revision_id) "revision_main_cluster_idx" btree (page_id, "timestamp") CLUSTER </code></pre> <p>This table contains revisions (~ 300 Million rows) to pages in a wiki. There are more columns in my table, but I've discarded them for this example because they shouldn't matter. </p> <pre><code> Table "revert" Column | Type | Modifiers --------------------+---------+----------- page_id | integer | revision_id | integer | reverted_to | integer | Indexes: "revert_page_between_idx" btree (page_id, reverted_to, revision_id) CLUSTER </code></pre> <p>This table contains reverting revisions (~22 Million rows). If a revisions has been reverted, that revision_id will have a row in the revision_main table and its revision_id will be between reverted_to and revision_id as well as share the same page_id. (See <a href="http://en.wikipedia.org/wiki/Wikipedia:Revert" rel="nofollow">http://en.wikipedia.org/wiki/Wikipedia:Revert</a> if you are curious.)</p> <p>Joining these two tables to get reverted revisions seems straightforward. Here is what I've come up with:</p> <pre><code>explain SELECT r.revision_id, rvt.revision_id FROM revision_main r INNER JOIN revert rvt ON r.page_id = rvt.page_id AND r.revision_id &gt; rvt.reverted_to AND r.revision_id &lt; rvt.revision_id; QUERY PLAN ---------------------------------------------------------------------------------------------------- Merge Join (cost=4202878.87..15927491478.57 rows=88418194298 width=8) Merge Cond: (r.page_id = rvt.page_id) Join Filter: ((r.revision_id &gt; rvt.reverted_to) AND (r.revision_id &lt; rvt.revision_id)) -&gt; Index Scan using revision_main_page_id_idx on revision_main r (cost=0.00..9740790.61 rows=223163392 width=8) -&gt; Materialize (cost=4201592.06..4536465.21 rows=26789852 width=12) -&gt; Sort (cost=4201592.06..4268566.69 rows=26789852 width=12) Sort Key: rvt.page_id -&gt; Seq Scan on revert rvt (cost=0.00..438534.52 rows=26789852 width=12) </code></pre> <p>Even though the clustered index on revert should be a Btree index (and thus support comparison operators like "&lt;" and ">"), the query optimizer does not use the index for the join and "explain" predicts a total cost of over 15 billion (might be done next year).</p> <p>Are comparison operators impossible to use with multi-column (btree) indexes? Am I just doing it wrong?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload