StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
235956
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2008-10-25T05:31:23.223
FavoriteCount
0
LastActivityDate
2008-10-25T14:02:29.370
LastEditDate
2008-10-25T14:02:29.370
LastEditorUserId
15168
OwnerUserId
15168
ParentId
235418
PostTypeId
2
Score
3
ViewCount
0
LastEditorDisplayName
Jonathan Leffler
text
Body
If you read the book "Developing Time-Oriented Database Applications in SQL" by <a href="http://www.cs.arizona.edu/~rts/" rel="nofollow noreferrer">R T Snodgrass</a> (the pdf of which is available from his web site under publications), and get as far as Figure 6.25 on p165-166, you will find the non-trivial SQL which can be used in the current example to group the various rows with the same ID value and continuous time intervals. The query development below is close to correct, but there is a problem spotted right at the end, that has its source in the first SELECT statement. I've not yet tracked down why the incorrect answer is being given. [If someone can test the SQL on their DBMS and tell me whether the first query works correctly there, it would be a great help!] It looks something like: <pre><code>-- Derived from Figure 6.25 from Snodgrass "Developing Time-Oriented -- Database Applications in SQL" CREATE TABLE Data ( Start DATE, Finish DATE, ID CHAR(2), Amount INT ); INSERT INTO Data VALUES('2008-10-01', '2008-10-02', '01', 10); INSERT INTO Data VALUES('2008-10-02', '2008-10-03', '02', 20); INSERT INTO Data VALUES('2008-10-03', '2008-10-04', '01', 38); INSERT INTO Data VALUES('2008-10-04', '2008-10-05', '01', 23); INSERT INTO Data VALUES('2008-10-05', '2008-10-06', '03', 14); INSERT INTO Data VALUES('2008-10-06', '2008-10-07', '02', 3); INSERT INTO Data VALUES('2008-10-07', '2008-10-08', '02', 8); INSERT INTO Data VALUES('2008-10-08', '2008-11-08', '03', 19); SELECT DISTINCT F.ID, F.Start, L.Finish FROM Data AS F, Data AS L WHERE F.Start < L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish < M.Start AND M.Start < L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start < M.Start AND M.Start <= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR (T2.Start <= L.Finish AND L.Finish < T2.Finish))); </code></pre> The output from that query is: <pre><code>01 2008-10-01 2008-10-02 01 2008-10-03 2008-10-05 02 2008-10-02 2008-10-03 02 2008-10-06 2008-10-08 03 2008-10-05 2008-10-06 03 2008-10-05 2008-11-08 03 2008-10-08 2008-11-08 </code></pre> Edited: There's a problem with the penultimate row - it should not be there. And I'm not clear (yet) where it is coming from. Now we need to treat that complex expression as a query expression in the FROM clause of another SELECT statement, which will sum the amount values for a given ID over the entries that overlap with the maximal ranges shown above. <pre><code>SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) FROM Data AS D, (SELECT DISTINCT F.ID, F.Start, L.Finish FROM Data AS F, Data AS L WHERE F.Start < L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish < M.Start AND M.Start < L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start < M.Start AND M.Start <= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)))) AS M WHERE D.ID = M.ID AND M.Start <= D.Start AND M.Finish >= D.Finish GROUP BY M.ID, M.Start, M.Finish ORDER BY M.ID, M.Start; </code></pre> This gives: <pre><code>ID Start Finish Amount 01 2008-10-01 2008-10-02 10 01 2008-10-03 2008-10-05 61 02 2008-10-02 2008-10-03 20 02 2008-10-06 2008-10-08 11 03 2008-10-05 2008-10-06 14 03 2008-10-05 2008-11-08 33 -- Here be trouble! 03 2008-10-08 2008-11-08 19 </code></pre> Edited: This is almost the correct data set on which to do the COUNT and SUM aggregation requested by the original question, so the final answer is: <pre><code>SELECT I.ID, COUNT(*) AS Number, SUM(I.Amount) AS Amount FROM (SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) AS Amount FROM Data AS D, (SELECT DISTINCT F.ID, F.Start, L.Finish FROM Data AS F, Data AS L WHERE F.Start < L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish < M.Start AND M.Start < L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start < M.Start AND M.Start <= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR (T2.Start <= L.Finish AND L.Finish < T2.Finish))) ) AS M WHERE D.ID = M.ID AND M.Start <= D.Start AND M.Finish >= D.Finish GROUP BY M.ID, M.Start, M.Finish ) AS I GROUP BY I.ID ORDER BY I.ID; id number amount 01 2 71 02 2 31 03 3 66 </code></pre> Review: Oh! Drat...the entry for 3 has twice the 'amount' that it should have. Previous 'edited' parts indicate where things started to go wrong. It looks as though either the first query is subtly wrong (maybe it is intended for a different question), or the optimizer I'm working with is misbehaving. Nevertheless, there should be an answer closely related to this that will give the correct values. For the record: tested on IBM Informix Dynamic Server 11.50 on Solaris 10. However, should work fine on any other moderately standard-conformant SQL DBMS.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POAggregate adjacent only records with T-SQL
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USJonathan Leffler
UserOwnerUserId
1. USJonathan Leffler
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTDownMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.