Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Thank you, this brings some help. So here's what I did:</p> <ul> <li>I created the table urls as you mentioned</li> <li>I've added a vid column of type integer to it</li> <li>I inserted 1000000 rows in full_url column from T2</li> <li>I enabled timing, and updated the hostname column with full_url that do not contain neither 'http' nor 'www' <code>update urls set hostname=full_url where full_url not like '%/%' and full_url not like 'www\.%';</code></li> </ul> <p><code>Time: 112435.192 ms</code></p> <p>Then I run this query:</p> <pre><code> mydb=&gt; explain analyse update urls set vid=vid from T1 where hostname=stxt1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Update on urls (cost=21.93..37758.76 rows=864449 width=124) (actual time=767.793..767.793 rows=0 loops=1) -&gt; Hash Join (cost=21.93..37758.76 rows=864449 width=124) (actual time=102.324..430.448 rows=94934 loops=1) Hash Cond: ((urls.hostname)::text = (T1.stxt1)::text) -&gt; Seq Scan on urls (cost=0.00..25612.52 rows=927952 width=114) (actual time=0.009..265.962 rows=927952 loops=1) -&gt; Hash (cost=15.30..15.30 rows=530 width=34) (actual time=0.444..0.444 rows=530 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 35kB -&gt; Seq Scan on T1 (cost=0.00..15.30 rows=530 width=34) (actual time=0.002..0.181 rows=530 loops=1) Total runtime: 767.860 ms </code></pre> <p>I was really surprised by the total runtime! less than 1 sec which confirms what you said about updates with exact matches. Now I searched for exacts matches between xtxt1 and stxt2 this way:</p> <pre><code>mydb=&gt; select count(*) from T2 where vid is null and exists(select null from T1 where stxt1=stxt2); count -------- 308486 (1 row) </code></pre> <p>Thus I tried the update on T2 table, and got this:</p> <pre><code>mydb=&gt; explain analyse update T2 set vid = T1.vid from T1 where T2.vid is null and stxt2=stxt1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Update on T2 (cost=21.93..492023.13 rows=2106020 width=131) (actual time=252395.118..252395.118 rows=0 loops=1) -&gt; Hash Join (cost=21.93..492023.13 rows=2106020 width=131) (actual time=1207.897..4739.515 rows=308486 loops=1) Hash Cond: ((T2.stxt2)::text = (T1.stxt1)::text) -&gt; Seq Scan on T2 (cost=0.00..455452.09 rows=4130377 width=121) (actual time=158.773..3915.379 rows=4103865 loops=1) Filter: (vid IS NULL) -&gt; Hash (cost=15.30..15.30 rows=530 width=34) (actual time=0.293..0.293 rows=530 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 35kB -&gt; Seq Scan on T1 (cost=0.00..15.30 rows=530 width=34) (actual time=0.005..0.121 rows=530 loops=1) Total runtime: 252395.204 ms (9 rows) Time: 255389.704 ms </code></pre> <p>Actually 255 sec seems to be a very good time for such a query. I'll try to extract the hostname part from all urls and make the update. I still should make sure that updating with exact matches is fast cause I had bad experience with it.</p> <p>Thank you for your support.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload