Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat is a fast way to preview a MySQL join?
    primarykey
    data
    text
    <p>I'm working on a project involving joins between datasets and we have a requirement to allow previews of arbitrary joins between arbitrary datasets. Which is crazy, but thats why its fun. This is use facing so given a join I want to show ~10 rows of results quickly.</p> <p>I've been basing my experimentation around different ways to sub-sample the different tables in such a way that I get at least a few result rows but keep the samples small enough that the join is fast and not cause the sampling to be expensive. </p> <p>Here are the methods I've found pass the smell test. I would like to know a few things about them:</p> <ol> <li>What types of joins or datasets would these fail at?</li> <li>How could I identify those datasets?</li> <li>If both of these are bad at the same thing, how could they be improved?</li> <li>Is there a type of sampling I have not put here that is better? </li> </ol> <h3>Subselect with a limit.</h3> <p>Takes a random sample of one dataset to reduce the overall size.</p> <pre><code>SELECT col1, col2 FROM table1 JOIN (SELECT col1, col2 FROM table2 LIMIT #) AS sample2 on table1.col1 = sample2.col1 LIMIT 10; </code></pre> <p>I like this because its easy and there is potential in the future to be smart about which table to samples from. It is also possible to select a portion where table1.col1 never equals sample2.col1 so no results are returned.</p> <h3>Find equals values of col1 and Sample them</h3> <p>More complicated, multi-query approach. Here I would do a distinct select of the columns to join on, compare the results to find common values and then do a subselect limiting the results to the common values.</p> <pre><code>SELECT DISTINCT col1 FROM table1; SELECT DISTINCT col1 FROM table2; commonVals = intersection of above results SELECT col1, col2 FROM table1 JOIN (SELECT col1, col2 FROM table2 WHERE col1 IN(commonVals) LIMIT #) as sample2 on table1.col1 = sample2.col1 LIMIT 10; </code></pre> <p>This gets us a good sample of table2, but the select distinct query may be more expensive than the join. I believe there may be a way to determine if this method is faster if you knew something about how long the distinct cals would take but at this point we don't have that much knowledge of the datasets.</p> <h3>Slap a LIMIT on the join</h3> <p>This is the easiest and the one I'm leaning towards.</p> <pre><code>SELECT col1, col1 FROM table1 join table2 on table1.col1 = table2.col1 LIMIT # </code></pre> <p>Assuming the join is good, this will always return data and for at least a large set of cases it will do it fast. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload