Note that there are some explanatory texts on larger screens.

plurals
  1. POIs there a way to calculate correlation in TSQL using OVER Clauses instead of CTE's?
    text
    copied!<p>Let's say you have a table with columns, Date, GroupID, X and Y.</p> <pre><code>CREATE TABLE #sample ( [Date] DATETIME, GroupID INT, X FLOAT, Y FLOAT ) DECLARE @date DATETIME = getdate() INSERT INTO #sample VALUES(@date, 1, 1,3) INSERT INTO #sample VALUES(DATEADD(d, 1, @date), 1, 1,1) INSERT INTO #sample VALUES(DATEADD(d, 2, @date), 1, 4,2) INSERT INTO #sample VALUES(DATEADD(d, 3, @date), 1, 3,3) INSERT INTO #sample VALUES(DATEADD(d, 4, @date), 1, 6,4) INSERT INTO #sample VALUES(DATEADD(d, 5, @date), 1, 7,5) INSERT INTO #sample VALUES(DATEADD(d, 6, @date), 1, 1,6) </code></pre> <p>and you want to calculate the correlation of X and Y for each group. Currently I use CTEs which get a little messy:</p> <pre><code>;WITH DataAvgStd AS (SELECT GroupID, AVG(X) AS XAvg, AVG(Y) AS YAvg, STDEV(X) AS XStdev, STDEV(Y) AS YSTDev, COUNT(*) AS SampleSize FROM #sample GROUP BY GroupID), ExpectedVal AS (SELECT s.GroupID, SUM(( X - XAvg ) * ( Y - YAvg )) AS ExpectedValue FROM #sample s JOIN DataAvgStd das ON s.GroupID = das.GroupID GROUP BY s.GroupID) SELECT das.GroupID, ev.ExpectedValue / ( das.SampleSize - 1 ) / ( das.XStdev * das.YSTDev ) AS Correlation FROM DataAvgStd das JOIN ExpectedVal ev ON das.GroupID = ev.GroupID DROP TABLE #sample </code></pre> <p>It seems like there should be a way to use OVER and PARTITION to do this in one fell swoop without any subqueries. Ideally TSQL would have a function so you could write:</p> <pre><code>SELECT GroupID, CORR(X, Y) OVER(PARTITION BY GroupID) FROM #sample GROUP BY GroupID </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload