Note that there are some explanatory texts on larger screens.

plurals
  1. POImprove query performance for union of CTE + normal select
    primarykey
    data
    text
    <p>On the basis of <a href="https://stackoverflow.com/questions/13391457/convert-recursive-function-to-view/">Convert recursive function to view</a>, I'd like to speed up the retrieval of the path from any arbitrary node in the graph to its root by creating a snapshot of a node's parents. The idea is that the recursive tree walk is bounded by having intermediate snapshots which avoid any further recursion and hence speed up execution time. I haven't performed a load test, so I don't know how this performs beyond this simple example, but early trials already indicate some bottlenecks. I'd be happy for comments on how to speed things up/simplify queries. I'm using Postgres 9.2.2.0 (20).</p> <pre><code>DROP TABLE IF EXISTS revision CASCADE; CREATE TABLE revision ( id serial primary key, parent_revision_id int references revision(id), username varchar(128), ts timestamp without time zone ); DROP TABLE IF EXISTS revision_snapshot CASCADE; CREATE TABLE revision_snapshot ( id serial primary key, revision_id int, parent_revision_id int, depth int ); CREATE OR REPLACE FUNCTION create_revision_snapshot(_rev int) RETURNS void AS $func$ DELETE FROM revision_snapshot WHERE revision_id=$1; INSERT INTO revision_snapshot (revision_id, parent_revision_id, depth) (SELECT $1, id, depth FROM revision_tree($1)); $func$ LANGUAGE sql; -- Recursively return path from '_rev' to root CREATE OR REPLACE FUNCTION revision_tree(_rev int) RETURNS TABLE(id int, parent_revision_id int, depth int) AS $func$ WITH RECURSIVE rev_list(id, parent_revision_id, depth) AS ( SELECT t.id, t.parent_revision_id, 1 FROM revision t WHERE t.id = $1 UNION ALL SELECT t.id, t.parent_revision_id, r.depth + 1 FROM rev_list r JOIN revision t ON t.id = r.parent_revision_id ) SELECT t.id, t.parent_revision_id, t.depth FROM rev_list t ORDER BY t.id; $func$ LANGUAGE sql; -- Fast version of 'revision_tree' (to be). This version will return the -- revision tree making use of snapshots (recursively returning the path from -- specified revision id to last snapshot of the path to the root + the snapshot) CREATE OR REPLACE FUNCTION revision_tree_perf(_rev int) RETURNS TABLE(parent_revision_id int) AS $func$ BEGIN CREATE TEMP TABLE graph_result ON COMMIT DROP AS WITH RECURSIVE rev_list(id, parent_revision_id, depth) AS ( SELECT t.id, t.parent_revision_id, 1 FROM revision t WHERE t.id = $1 UNION ALL SELECT t.id, t.parent_revision_id, r.depth + 1 FROM rev_list r JOIN revision t ON t.id = r.parent_revision_id WHERE not(t.id in (select revision_id from revision_snapshot)) ) SELECT t.id, t.parent_revision_id, t.depth FROM rev_list t ORDER BY t.id; RETURN QUERY SELECT g.parent_revision_id FROM graph_result AS g WHERE g.parent_revision_id IS NOT NULL UNION SELECT s.parent_revision_id FROM revision_snapshot AS s WHERE s.revision_id = (SELECT min(q.parent_revision_id) FROM graph_result as q) ORDER BY parent_revision_id; END; $func$ LANGUAGE 'plpgsql'; -- Example tree -- -- +-- &lt;10&gt; -- / -- +-- &lt;4&gt; -- &lt;8&gt; --- &lt;9&gt; -+- &lt;11&gt; --- &lt;15&gt; --- &lt;16&gt; --- &lt;17&gt; -- / -- &lt;1&gt; --- &lt;2&gt; --- &lt;3&gt; -+ -- \ -- +-- &lt;5&gt; --- &lt;6&gt; --- &lt;7&gt; --- &lt;12&gt; -+- &lt;14&gt; --- &lt;18&gt; -- \ -- \ -- \ -- \ -- +-- &lt;13&gt; --- &lt;19&gt; --- &lt;20&gt; --- &lt;21&gt; -- INSERT INTO revision (username, ts, parent_revision_id) VALUES ('someone', now(), null) -- 1 ,('someone', now(), 1) -- 2 ,('someone', now(), 2) -- 3 ,('someone', now(), 3) -- 4 ,('someone', now(), 3) -- 5 ,('someone', now(), 5) -- 6 ,('someone', now(), 6) -- 7 ,('someone', now(), 4) -- 8 ,('someone', now(), 8) -- 9 ,('someone', now(), 9) -- 10 ,('someone', now(), 9) -- 11 ,('someone', now(), 7) -- 12 ,('someone', now(), 12) -- 13 ,('someone', now(), 12) -- 14 ,('someone', now(), 11) -- 15 ,('someone', now(), 15) -- 16 ,('someone', now(), 16) -- 17 ,('someone', now(), 14) -- 18 ,('someone', now(), 13) -- 19 ,('someone', now(), 19) -- 20 ,('someone', now(), 20); -- 21 -- Create a revision snapsnot select create_revision_snapshot(13); -- This query is meant to be faster ... select * from revision_tree_perf(21); -- ... than this one select * from revision_tree(21); </code></pre> <p>The above example</p> <pre><code>select create_revision_snapshot(13); select * from revision_tree_perf(21); </code></pre> <p>is meant to yield a record set that denotes the path from 21 to the root, i.e. <code>(21, 20, 19, 13, 12, 7, 6, 5, 3, 2, 1)</code>. Part of the solution is taken by walking the tree (21 to 13, as there's a snapshot for 13, so there's no need to walk the tree any further) and by using the already 'cached' path from 13 to the root (taken from revision_snapshot). Hope that makes it a bit easier to understand...</p> <p>Update:<br> I've come up with a potential improvement. This is just a stab in the dark, but I can imagine that <code>exists</code> clauses are quite expensive. I now marked the existence of a snapshot in the revision table:</p> <pre><code>CREATE TABLE revision ( id serial primary key, parent_revision_id int references revision(id), username varchar(128), has_snapshot boolean default false, ts timestamp without time zone ); CREATE OR REPLACE FUNCTION create_revision_snapshot(_rev int) RETURNS void AS $func$ DELETE FROM revision_snapshot WHERE revision_id=$1; INSERT INTO revision_snapshot (revision_id, parent_revision_id, depth) (SELECT $1, id, depth FROM revision_tree($1)); -- Mark revision table to avoid costly exists/in clause UPDATE revision SET has_snapshot = true WHERE id=$1; $func$ LANGUAGE sql; </code></pre> <p>This changes the CTE part of the <code>revision_tree_perf</code> SP to</p> <pre><code>WITH RECURSIVE rev_list(id, parent_revision_id, depth) AS ( SELECT t.id, t.parent_revision_id, 1 -- AS depth FROM revision t WHERE t.id = $1 UNION ALL SELECT t.id, t.parent_revision_id, r.depth + 1 FROM rev_list r JOIN revision t ON t.id = r.parent_revision_id WHERE t.has_snapshot = false ) SELECT t.id, t.parent_revision_id, t.depth FROM rev_list t ORDER BY t.id; </code></pre> <p>This should execute quite quickly. The other part of the puzzle would be to return the content of revision_snapshot from the revision id where <code>has_snapshot=true</code> and join these two results. The problem is how to get this revision id from the CTE. I could store the query result of the CTE in a temp table and query for the revision id, or perhaps it's advisable not to write a CTE and write it as a loop instead. This way, one could keep track of the revision id at which the loop would exit (when <code>has_snapshot = true</code>). But I'm not sure how this would compare against a CTE.</p> <p>What do people think about this approach?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload