Note that there are some explanatory texts on larger screens.

plurals
  1. PORemove redundant SQL code
    primarykey
    data
    text
    <p><strong>Code</strong></p> <p>The following code calculates the slope and intercept for a linear regression against a slathering of data. It then applies the equation <code>y = mx + b</code> against the same result set to calculate the value of the regression line for each row.</p> <p>How can the two queries be joined so that the data and its slope/intercept are calculated without executing the <code>WHERE</code> clause twice?</p> <p>The general form of the problem is:</p> <pre><code>SELECT a.group, func(a.group, avg_avg) FROM a (SELECT AVG(field1_avg) as avg_avg FROM (SELECT a.group, AVG(field1) as field1_avg FROM a WHERE (SOME_CONDITION) GROUP BY a.group) as several_lines -- potentially ) as one_line -- always WHERE (SOME_CONDITION) GROUP BY a.group -- again, potentially several lines </code></pre> <p>I have <code>SOME_CONDITION</code> executing twice. This is shown below (updated with a <code>STRAIGHT_JOIN</code> optimization):</p> <pre><code>SELECT STRAIGHT_JOIN AVG(D.AMOUNT) as AMOUNT, Y.YEAR * ymxb.SLOPE + ymxb.INTERCEPT as REGRESSION_LINE, Y.YEAR as YEAR, MAKEDATE(Y.YEAR,1) as AMOUNT_DATE, ymxb.SLOPE, ymxb.INTERCEPT, ymxb.CORRELATION, ymxb.MEASUREMENTS FROM CITY C, STATION S, STATION_DISTRICT SD, YEAR_REF Y, MONTH_REF M, DAILY D, (SELECT SUM(MEASUREMENTS) as MEASUREMENTS, ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE, ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) - (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT, ((avg(t.AMOUNT * t.YEAR)) - avg(t.AMOUNT) * avg(t.YEAR)) / (stddev( t.AMOUNT ) * stddev( t.YEAR )) as CORRELATION FROM ( SELECT STRAIGHT_JOIN COUNT(1) as MEASUREMENTS, AVG(D.AMOUNT) as AMOUNT, Y.YEAR as YEAR FROM CITY C, STATION S, STATION_DISTRICT SD, YEAR_REF Y, MONTH_REF M, DAILY D WHERE -- For a specific city ... -- $X{ IN, C.ID, CityCode } AND -- Find all the stations within a specific unit radius ... -- 6371.009 * SQRT( POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) + (COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) * POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) &lt;= $P{Radius} AND SD.ID = S.STATION_DISTRICT_ID AND -- Gather all known years for that station ... -- Y.STATION_DISTRICT_ID = SD.ID AND -- The data before 1900 is shaky; insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = $P{CategoryCode} AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID &lt;&gt; 'M' GROUP BY Y.YEAR ) t ) ymxb WHERE -- For a specific city ... -- $X{ IN, C.ID, CityCode } AND -- Find all the stations within a specific unit radius ... -- 6371.009 * SQRT( POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) + (COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) * POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) &lt;= $P{Radius} AND SD.ID = S.STATION_DISTRICT_ID AND -- Gather all known years for that station ... -- Y.STATION_DISTRICT_ID = SD.ID AND -- The data before 1900 is shaky; insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = $P{CategoryCode} AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID &lt;&gt; 'M' GROUP BY Y.YEAR </code></pre> <p><strong>Question</strong></p> <p>How do I execute the duplicate bits only once per query, instead of twice? The duplicate code:</p> <pre><code> $X{ IN, C.ID, CityCode } AND 6371.009 * SQRT( POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) + (COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) * POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) &lt;= $P{Radius} AND SD.ID = S.STATION_DISTRICT_ID AND Y.STATION_DISTRICT_ID = SD.ID AND Y.YEAR BETWEEN 1900 AND 2009 AND M.YEAR_REF_ID = Y.ID AND M.CATEGORY_ID = $P{CategoryCode} AND M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID &lt;&gt; 'M' GROUP BY Y.YEAR </code></pre> <p><strong>Update 1</strong></p> <p>Using variables and splitting the query seems to allow the cache to kick in as this now runs in 3.5 seconds, whereas it used to run in 7. Still, if there is any way to remove the duplicate code, I'd be grateful for any help.</p> <p><strike> <strong>Update 2</strong></p> <p>The above code does not run in JasperReports, and a VIEW, while a possible fix, would probably be extremely inefficient (because the WHERE clauses are parameterized). </strike></p> <p><strong>Update 3</strong></p> <p>Validating distance using Unreason's suggestion of the Pythagorean formula with converging meridians:</p> <pre><code> 6371.009 * SQRT( POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) + (COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) * POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) </code></pre> <p>(This is unrelated to the question, but should someone else want to know ...)</p> <p><strong>Update 4</strong></p> <p>The code, as shown, works in JasperReports, running against a MySQL database. JasperReports does not allow variables or multiple queries.</p> <p><strong>Update 5</strong></p> <p>Am looking for a solution that executes cleanly. ;-) I have written a number of partially working solutions, but MySQL, sadly, does not understand <em>partially correct</em>. See the discussions with Unreason for answers that almost work.</p> <p><strong>Update 6</strong></p> <p>I might be able to reuse variables from the first <code>WHERE</code> clause and compare them to the second (thereby eliminating <em>some</em> duplication -- the checks against <code>$P{}</code> values), but I'd really like the duplication eliminated.</p> <p><strong>Update 7</strong></p> <p>Comparing the <code>YEAR</code> clause, as hypothesized in the previous update, to eliminate the duplicate <code>BETWEEN</code>, does not work.</p> <p><strong>Related</strong></p> <p><a href="https://stackoverflow.com/questions/1595659/how-to-eliminate-duplicate-calculation-in-sql">How to eliminate duplicate calculation in SQL?</a></p> <p>Thank you!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload