Note that there are some explanatory texts on larger screens.

plurals
  1. POSelect duplicate records created within a few minutes of the first
    primarykey
    data
    text
    <p>My company has been receiving duplicate records in a database that are created up to 4 minutes after the first. Logically speaking, a group of records consists of the original record plus any subsequent records that were created within that 4 minute time frame. The initial record gets a <code>TO_DELETE</code> value of <code>'N'</code>, while each duplicate record gets a <code>TO_DELETE</code> value of <code>'Y'</code>. Each new group starts over with an <code>'N'</code> value.</p> <p>With the help of <a href="https://stackoverflow.com/questions/11839893/deleting-invalid-duplicate-rows-in-sql">Deleting Invalid Duplicate Rows in SQL</a>, I've put together a query to select them but it's been running for over 2 hours and has yet to return a result set so I'm not sure if it's caught in an infinite loop. Any help identifying issues with it would be appreciated!</p> <pre><code>with LEAD_CTE as ( select *, ROW_NUMBER() over (partition by LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE order by CREATEDDATE) as ROWNUMBER from LEAD where DELETE_FLAG &lt;&gt; 'Y' and CREATEDDATE &gt;= (GETDATE() - 7) ), CTE as ( select ROWNUMBER, 'N' as TO_DELETE, CREATEDDATE, 0 as TOTAL_MINUTES, LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE from LEAD_CTE where ROWNUMBER = 1 union all select l.ROWNUMBER, case when ((c.TOTAL_MINUTES + DATEDIFF(MINUTE, c.CREATEDDATE, l.CREATEDDATE)) &gt; 4) then 'N' else 'Y' end as TO_DELETE, l.CREATEDDATE, case when ((c.TOTAL_MINUTES + DATEDIFF(MINUTE, c.CREATEDDATE, l.CREATEDDATE)) &gt; 4) then 0 else (c.TOTAL_MINUTES + DATEDIFF(MINUTE, c.CREATEDDATE, l.CREATEDDATE)) end as TOTAL_MINUTES, l.EMAIL, l.FIRSTNAME, l.LASTNAME, l.PRIMARY_PHONE, l.PROGRAMX, l.TERM_CODE, l.INQ_TYPE, l.LEADSOURCE from LEAD_CTE l inner join CTE c on l.ROWNUMBER = (c.ROWNUMBER + 1) ) select ROWNUMBER, TO_DELETE, CREATEDDATE, TOTAL_MINUTES, LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE from CTE order by LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE, CREATEDDATE </code></pre> <p>Sample data:</p> <pre><code>CREATEDDATE | LASTNAME | FIRSTNAME | EMAIL | PRIMARY_PHONE | PROGRAMX | TERM_CODE | INQ_TYPE | LEADSOURCE --------------------------------------------------------------------------------------------------------------------------------------------- 2013-09-24 00:06:01.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 2013-09-24 00:18:47.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 2013-09-24 00:18:50.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 2013-09-24 00:18:52.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 2013-09-24 00:18:52.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 2013-09-24 00:18:54.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 2013-09-24 00:18:55.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 2013-09-24 00:18:56.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 2013-09-24 00:18:56.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform </code></pre> <p>New CTE with self-join:</p> <pre><code>with LEAD_CTE as ( select *, ROW_NUMBER() over (partition by LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE order by CREATEDDATE) as ROWNUMBER from LEAD where DELETE_FLAG &lt;&gt; 'Y' and CREATEDDATE &gt;= (GETDATE() - 7) ) select l1.ROWNUMBER, l1.CREATEDDATE, l2.CREATEDDATE, DATEDIFF(MINUTE, l1.CREATEDDATE, l2.CREATEDDATE), l1.LASTNAME, l1.FIRSTNAME, l1.EMAIL, l1.PRIMARY_PHONE, l1.PROGRAMX, l1.TERM_CODE, l1.INQ_TYPE, l1.LEADSOURCE from LEAD_CTE l1 left join LEAD_CTE l2 on l1.ROWNUMBER = (l2.ROWNUMBER + 1) and l1.LASTNAME = l2.LASTNAME and l1.FIRSTNAME = l2.FIRSTNAME and l1.EMAIL = l2.EMAIL and l1.PRIMARY_PHONE = l2.PRIMARY_PHONE and l1.PROGRAMX = l2.PROGRAMX and l1.TERM_CODE = l2.TERM_CODE and l1.INQ_TYPE = l2.INQ_TYPE and l1.LEADSOURCE = l2.LEADSOURCE order by l1.ROWNUMBER </code></pre> <p>Actual output:</p> <pre><code>ROWNUMBER | CREATEDDATE | CREATEDDATE | (no column name) | LASTNAME | FIRSTNAME | EMAIL | PRIMARY_PHONE | PROGRAMX | TERM_CODE | INQ_TYPE | LEADSOURCE ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 1 | 2013-09-24 00:06:01.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 2 | 2013-09-24 00:18:47.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 3 | 2013-09-24 00:18:50.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 4 | 2013-09-24 00:18:52.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 5 | 2013-09-24 00:18:52.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 6 | 2013-09-24 00:18:54.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 7 | 2013-09-24 00:18:55.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 8 | 2013-09-24 00:18:56.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform 9 | 2013-09-24 00:18:56.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform </code></pre> <p>What's interesting is all l2 fields in every record come in as <code>NULL</code>, which I found as a result of the <code>DATEDIFF()</code> calculations returning <code>NULL</code> as well. My expected output would be that all l2 fields would have the values of the next l1 record, with the exception of the last record's l2 fields, which would be <code>NULL</code>.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload