Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to optimize this long running sqlite3 query for finding duplicates?
    primarykey
    data
    text
    <p>I've got this rather insane query for finding <strong>all but the FIRST</strong> record with a duplicate value. It takes a substantially long time to run on 38000 records; about 50 seconds.</p> <pre><code>UPDATE exr_exrresv SET mh_duplicate = 1 WHERE exr_exrresv._id IN ( SELECT F._id FROM exr_exrresv AS F WHERE Exists ( SELECT PHONE_NUMBER, Count(_id) FROM exr_exrresv WHERE exr_exrresv.PHONE_NUMBER = F.PHONE_NUMBER AND exr_exrresv.PHONE_NUMBER != '' AND mh_active = 1 AND mh_duplicate = 0 GROUP BY exr_exrresv.PHONE_NUMBER HAVING Count(exr_exrresv._id) &gt; 1) ) AND exr_exrresv._id NOT IN ( SELECT Min(_id) FROM exr_exrresv AS F WHERE Exists ( SELECT PHONE_NUMBER, Count(_id) FROM exr_exrresv WHERE exr_exrresv.PHONE_NUMBER = F.PHONE_NUMBER AND exr_exrresv.PHONE_NUMBER != '' AND mh_active = 1 AND mh_duplicate = 0 GROUP BY exr_exrresv.PHONE_NUMBER HAVING Count(exr_exrresv._id) &gt; 1 ) GROUP BY PHONE_NUMBER ); </code></pre> <p>Any tips on how to optimize it or how I should begin to go about it? I've checked out the query plan but I'm really not sure how to begin improving it. Temp tables? Better query?</p> <p>Here is the explain query plan output:</p> <pre><code>0|0|0|SEARCH TABLE exr_exrresv USING INTEGER PRIMARY KEY (rowid=?) (~12 rows) 0|0|0|EXECUTE LIST SUBQUERY 0 0|0|0|SCAN TABLE exr_exrresv AS F (~500000 rows) 0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1 1|0|0|SEARCH TABLE exr_exrresv USING AUTOMATIC COVERING INDEX (PHONE_NUMBER=? AND mh_active=? AND mh_duplicate=?) (~7 rows) 1|0|0|USE TEMP B-TREE FOR GROUP BY 0|0|0|EXECUTE LIST SUBQUERY 2 2|0|0|SCAN TABLE exr_exrresv AS F (~500000 rows) 2|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 3 3|0|0|SEARCH TABLE exr_exrresv USING AUTOMATIC COVERING INDEX (PHONE_NUMBER=? AND mh_active=? AND mh_duplicate=?) (~7 rows) 3|0|0|USE TEMP B-TREE FOR GROUP BY 2|0|0|USE TEMP B-TREE FOR GROUP BY </code></pre> <p>Any tips would be much appreciated. :)</p> <p>Also, I am using Ruby to make the sql query so if it makes more sense for the logic to leave SQL and be written in Ruby, that's possible.</p> <p>The schema is as follows, and you can use sqlfiddle here: <a href="http://sqlfiddle.com/#!2/2c07e" rel="nofollow">http://sqlfiddle.com/#!2/2c07e</a></p> <pre><code>_id INTEGER PRIMARY KEY OPPORTUNITY_ID varchar(50) CREATEDDATE varchar(50) FIRSTNAME varchar(50) LASTNAME varchar(50) MAILINGSTREET varchar(50) MAILINGCITY varchar(50) MAILINGSTATE varchar(50) MAILINGZIPPOSTALCODE varchar(50) EMAIL varchar(50) CONTACT_PHONE varchar(50) PHONE_NUMBER varchar(50) CallFromWeb varchar(50) OPPORTUNITY_ORIGIN varchar(50) PROJECTED_LTV varchar(50) MOVE_IN_DATE varchar(50) mh_processed_date varchar(50) mh_control INTEGER mh_active INTEGER mh_duplicate INTEGER </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload