Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimal two variable linear regression calculation
    primarykey
    data
    text
    <p><strong>Problem</strong></p> <p>Am looking to apply the <code>y = mx + b</code> equation (where m is <code>SLOPE</code>, b is <code>INTERCEPT</code>) to a data set, which is retrieved as shown in the SQL code. The values from the (MySQL) query are:</p> <pre><code>SLOPE = 0.0276653965651912 INTERCEPT = -57.2338357550468 </code></pre> <p><strong>SQL Code</strong></p> <pre><code>SELECT ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE, ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) - (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT, FROM (SELECT D.AMOUNT, Y.YEAR FROM CITY C, STATION S, YEAR_REF Y, MONTH_REF M, DAILY D WHERE -- For a specific city ... -- C.ID = 8590 AND -- Find all the stations within a 15 unit radius ... -- SQRT( POW( C.LATITUDE - S.LATITUDE, 2 ) + POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) &lt; 15 AND -- Gather all known years for that station ... -- S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND -- The data before 1900 is shaky; insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = '001' AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID &lt;&gt; 'M' GROUP BY Y.YEAR ORDER BY Y.YEAR ) t </code></pre> <p><strong>Question</strong></p> <p>The following results (to calculate the start and end points of the line) appear incorrect. Why are the results off by ~10 degrees (e.g., outliers skewing the data)?</p> <blockquote> <p>(1900 * 0.0276653965651912) + (-57.2338357550468) = -4.66958228</p> <p>(2009 * 0.0276653965651912) + (-57.2338357550468) = -1.65405406</p> </blockquote> <p>(Note that the data no longer match the image; the code.)</p> <p>I would have expected the 1900 result to be around 10 (not -4.67) and the 2009 result to be around 11.50 (not -1.65).</p> <p><strong>Related Sites</strong></p> <ul> <li><a href="http://en.wikipedia.org/wiki/Least_absolute_deviations" rel="nofollow noreferrer">Least absolute deviations</a></li> <li><a href="http://en.wikipedia.org/wiki/Robust_regression" rel="nofollow noreferrer">Robust regression</a></li> </ul>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload