Note that there are some explanatory texts on larger screens.

plurals
  1. POPossible bug with RANGE option of window aggregates and parallel plans in SQL Server 2012?
    primarykey
    data
    text
    <p>I’m getting some interesting behaviour in SQL Server 2012 when using the RANGE option with window aggregate functions, and am not sure if this is a bug or a ‘feature’ of SQL Server 2012. I have a table defined as follows:</p> <pre><code>CREATE TABLE [Test].[Trades]( [ID] [int] IDENTITY(1,1) NOT NULL, [Member] [varchar](20) NOT NULL, [TradeDate] [date] NOT NULL, [Fund] [varchar](4) NOT NULL, [Units] [decimal](28, 8) NOT NULL, PRIMARY KEY CLUSTERED ( [ID] ASC ) ); </code></pre> <p>This table stores the trades that a member makes in a fund on a particular trade date. A member is able to make >1 trade in a given fund on a given date. In addition to the clustered index I have a non-clustered index defined as follows:</p> <pre><code>CREATE NONCLUSTERED INDEX [Ix_TradesIndex] ON [Test].[Trades] ( [Member] ASC, [Fund] ASC, [TradeDate] ASC ) INCLUDE ([Units]); </code></pre> <p>If I wish to query the data set to give me the running total of units that each member has in each fund then using the extensions to the window aggregates in SQL Server 2012 I can answer the question as follows:</p> <pre><code>SELECT T.Member, T.Fund, T.TradeDate, SUM(T.Units) OVER(PARTITION BY T.Member, T.Fund ORDER BY T.TradeDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TotalShares FROM Test.Trades AS T; </code></pre> <p>This will give me a data set similar to below (example shows a member who made >1 trade in Fund2 on 2005-02-03):</p> <p>....</p> <p>Member1, Fund1, 2005-03-31, 0.00</p> <p>Member1, Fund2, 2005-02-03, 3256.50</p> <p>Member1, Fund2, 2005-02-03, 3256.50</p> <p>....</p> <p>The RANGE option has ensured that where the ordering clause is not unique (i.e. a given member has made more than one trade in a given fund on a particular trade date) that the window is includes all duplicate rows at the top of the range. This is working correctly as expected. However, if I wish to say ‘now give me only the distinct rows from this set’ (i.e. get rid of the duplicate entries) one way to ask this question is as follows:</p> <pre><code>SELECT DISTINCT T.Member, T.Fund, T.TradeDate, T.TotalShares FROM ( SELECT T.Member, T.Fund, T.TradeDate, SUM(T.Units) OVER(PARTITION BY T.Member, T.Fund ORDER BY T.TradeDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TotalShares FROM Test.Trades AS T ) AS T; </code></pre> <p>Here things get interesting: what I am seeing is that with large data sets <strong>if the plan goes parallel</strong> then the resulting set is non-deterministic (i.e. the query gives the wrong answer and the number of rows that the query returns can change on subsequent runs of the query). <strong>If the plan does not go parallel</strong> (which I can obviously force by specifying OPTION(MAXDOP 1)) then the query always returns the same number of rows, and the result set if the ‘correct’ result set. To me this feels like a bug in SQL Server 2012. </p> <p>My question is ‘<strong>does anyone have an alternative explanation for this behaviour, or is this a bug?</strong>’</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload