StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POOptimizing this MySQL Query
text
Body
copied!<p><strong>Table Schema</strong></p> <p>For the two tables, the CREATE queries are given below:</p> <p><strong><em>Table1:</em></strong> (file_path_key, dir_path_key)</p> <pre><code>create table Table1( file_path_key varchar(500), dir_path_key varchar(500), primary key(file_path_key)) engine = innodb; </code></pre> <p><strong><em>Table2:</em></strong> (file_path_key, hash_key) </p> <pre><code>create table Table2( file_path_key varchar(500) not null, hash_key bigint(20) not null, foreign key (file_path_key) references Table1(file_path_key) on update cascade on delete cascade) engine = innodb; </code></pre> <p><strong><em>Objective</em>:</strong></p> <p>Given a file_path <em>F</em> and it's dir_path string <em>D</em>, I need to find all those file names which have at least one hash in the set of hashes of <em>F</em>, but don't have their directory names as <em>D</em>. If a file <em>F1</em> shares multiple hashes with <em>F</em>, then it should be repeated that many times.</p> <p>Note that the file_path_key column in Table1 and the hash_key column in Table2 are indexed.</p> <p>In this particular case, Table1 has around 350,000 entries and Table2 has 31,167,119 entries, which makes my current query slow:</p> <pre><code>create table temp as select hash_key from Table2 where file_path_key = F; select s1.file_path_key from Table1 as s1 join Table2 as s2 on s1.file_path_key join temp on temp.hash_key = s2.hash_key where s1.dir_path_key != D </code></pre> <p>How can I speed up this query? </p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload