StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POfinding consecutive date pairs in SQL
text
Body
copied!<p>I have a question here that looks a little like some of the ones that I found in search, but with solutions for slightly different problems and, importantly, ones that don't work in SQL 2000.</p> <p>I have a very large table with a lot of redundant data that I am trying to reduce down to just the useful entries. It's a history table, and the way it works, if two entries are essentially duplicates and consecutive when sorted by date, the latter can be deleted. The data from the earlier entry will be used when historical data is requested from a date between the effective date of that entry and the next non-duplicate entry.</p> <p>The data looks something like this:</p> <pre><code>id user_id effective_date important_value useless_value 1 1 1/3/2007 3 0 2 1 1/4/2007 3 1 3 1 1/6/2007 NULL 1 4 1 2/1/2007 3 0 5 2 1/5/2007 12 1 6 3 1/1/1899 7 0 </code></pre> <p>With this sample set, we would consider two consecutive rows duplicates if the <code>user_id</code> and the <code>important_value</code> are the same. From this sample set, we would only delete row with <code>id</code>=2, preserving the information from 1-3-2007, showing that the <code>important_value</code> changed on 1-6-2007, and then showing the relevant change again on 2-1-2007.</p> <p>My current approach is awkward and time-consuming, and I know there must be a better way. I wrote a script that uses a cursor to iterate through the <code>user_id</code> values (since that breaks the huge table up into manageable pieces), and creates a temp table of just the rows for that user. Then to get consecutive entries, it takes the temp table, joins it to itself on the condition that there are no other entries in the temp table with a date between the two dates. In the pseudocode below, <code>UDF_SameOrNull</code> is a function that returns 1 if the two values passed in are the same or if they are both NULL.</p> <pre><code>WHILE (@@fetch_status <> -1) BEGIN SELECT * FROM History INTO #history WHERE user_id = @UserId --return entries to delete SELECT h2.id INTO #delete_history_ids FROM #history h1 JOIN #history h2 ON h1.effective_date < h2.effective_date AND dbo.UDF_SameOrNull(h1.important_value, h2.important_value)=1 WHERE NOT EXISTS (SELECT 1 FROM #history hx WHERE hx.effective_date > h1.effective_date and hx.effective_date < h2.effective_date) DELETE h1 FROM History h1 JOIN #delete_history_ids dh ON h1.id = dh.id FETCH NEXT FROM UserCursor INTO @UserId END </code></pre> <p>It also loops over the same set of duplicates until there are none, since taking out rows creates new consecutive pairs that are potentially dupes. I left that out for simplicity.</p> <p>Unfortunately, I must use SQL Server 2000 for this task and I am pretty sure that it does not support ROW_NUMBER() for a more elegant way to find consecutive entries.</p> <p>Thanks for reading. I apologize for any unnecessary backstory or errors in the pseudocode.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload