Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>If you read the book "Developing Time-Oriented Database Applications in SQL" by <a href="http://www.cs.arizona.edu/~rts/" rel="nofollow noreferrer">R T Snodgrass</a> (the pdf of which is available from his web site under publications), and get as far as Figure 6.25 on p165-166, you will find the non-trivial SQL which can be used in the current example to group the various rows with the same ID value and continuous time intervals.</p> <p><em>The query development below is close to correct, but there is a problem spotted right at the end, that has its source in the first SELECT statement. I've not yet tracked down why the incorrect answer is being given.</em> [If someone can test the SQL on their DBMS and tell me whether the first query works correctly there, it would be a great help!]</p> <p>It looks something like:</p> <pre><code>-- Derived from Figure 6.25 from Snodgrass "Developing Time-Oriented -- Database Applications in SQL" CREATE TABLE Data ( Start DATE, Finish DATE, ID CHAR(2), Amount INT ); INSERT INTO Data VALUES('2008-10-01', '2008-10-02', '01', 10); INSERT INTO Data VALUES('2008-10-02', '2008-10-03', '02', 20); INSERT INTO Data VALUES('2008-10-03', '2008-10-04', '01', 38); INSERT INTO Data VALUES('2008-10-04', '2008-10-05', '01', 23); INSERT INTO Data VALUES('2008-10-05', '2008-10-06', '03', 14); INSERT INTO Data VALUES('2008-10-06', '2008-10-07', '02', 3); INSERT INTO Data VALUES('2008-10-07', '2008-10-08', '02', 8); INSERT INTO Data VALUES('2008-10-08', '2008-11-08', '03', 19); SELECT DISTINCT F.ID, F.Start, L.Finish FROM Data AS F, Data AS L WHERE F.Start &lt; L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish &lt; M.Start AND M.Start &lt; L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start &lt; M.Start AND M.Start &lt;= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start &lt; F.Start AND F.Start &lt;= T2.Finish) OR (T2.Start &lt;= L.Finish AND L.Finish &lt; T2.Finish))); </code></pre> <p>The output from that query is:</p> <pre><code>01 2008-10-01 2008-10-02 01 2008-10-03 2008-10-05 02 2008-10-02 2008-10-03 02 2008-10-06 2008-10-08 03 2008-10-05 2008-10-06 03 2008-10-05 2008-11-08 03 2008-10-08 2008-11-08 </code></pre> <p><strong>Edited</strong>: There's a problem with the penultimate row - it should not be there. And I'm not clear (yet) where it is coming from.</p> <p>Now we need to treat that complex expression as a query expression in the FROM clause of another SELECT statement, which will sum the amount values for a given ID over the entries that overlap with the maximal ranges shown above.</p> <pre><code>SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) FROM Data AS D, (SELECT DISTINCT F.ID, F.Start, L.Finish FROM Data AS F, Data AS L WHERE F.Start &lt; L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish &lt; M.Start AND M.Start &lt; L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start &lt; M.Start AND M.Start &lt;= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start &lt; F.Start AND F.Start &lt;= T2.Finish) OR (T2.Start &lt;= L.Finish AND L.Finish &lt; T2.Finish)))) AS M WHERE D.ID = M.ID AND M.Start &lt;= D.Start AND M.Finish &gt;= D.Finish GROUP BY M.ID, M.Start, M.Finish ORDER BY M.ID, M.Start; </code></pre> <p>This gives:</p> <pre><code>ID Start Finish Amount 01 2008-10-01 2008-10-02 10 01 2008-10-03 2008-10-05 61 02 2008-10-02 2008-10-03 20 02 2008-10-06 2008-10-08 11 03 2008-10-05 2008-10-06 14 03 2008-10-05 2008-11-08 33 -- Here be trouble! 03 2008-10-08 2008-11-08 19 </code></pre> <p><strong>Edited</strong>: This is <em>almost</em> the correct data set on which to do the COUNT and SUM aggregation requested by the original question, so the final answer is:</p> <pre><code>SELECT I.ID, COUNT(*) AS Number, SUM(I.Amount) AS Amount FROM (SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) AS Amount FROM Data AS D, (SELECT DISTINCT F.ID, F.Start, L.Finish FROM Data AS F, Data AS L WHERE F.Start &lt; L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish &lt; M.Start AND M.Start &lt; L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start &lt; M.Start AND M.Start &lt;= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start &lt; F.Start AND F.Start &lt;= T2.Finish) OR (T2.Start &lt;= L.Finish AND L.Finish &lt; T2.Finish))) ) AS M WHERE D.ID = M.ID AND M.Start &lt;= D.Start AND M.Finish &gt;= D.Finish GROUP BY M.ID, M.Start, M.Finish ) AS I GROUP BY I.ID ORDER BY I.ID; id number amount 01 2 71 02 2 31 03 3 66 </code></pre> <p><strong>Review</strong>: Oh! Drat...the entry for 3 has twice the 'amount' that it should have. Previous 'edited' parts indicate where things started to go wrong. It looks as though either the first query is subtly wrong (maybe it is intended for a different question), or the optimizer I'm working with is misbehaving. Nevertheless, there should be an answer closely related to this that will give the correct values.</p> <p>For the record: tested on IBM Informix Dynamic Server 11.50 on Solaris 10. However, should work fine on any other moderately standard-conformant SQL DBMS.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload