Note that there are some explanatory texts on larger screens.

plurals
  1. POIn SQL Server what is most efficient way to compare records to other records for duplicates with in a given range of values?
    primarykey
    data
    text
    <p>We have an SQL Server that gets daily imports of data files from clients. This data is interrelated and we are always scrubbing it and having to look for suspect duplicate records between these files.</p> <p>Finding and tagging suspect records can get pretty complicated. We use logic that requires some field values to be the same, allows some field values to differ, and allows a range to be specified for how different certain field values can be. The only way we've found to do it is by using a cursor based process, and it places a heavy burden on the database. </p> <p>So I wanted to ask if there's a more efficient way to do this. I've heard it said that there's almost always a more efficient way to replace cursors with clever JOINS. But I have to admit I'm having a lot of trouble with this one. </p> <p>For a concrete example suppose we have 1 table, an "orders" table, with the following 6 fields. </p> <pre><code>(order_id, customer_id, product_id, quantity, sale_date, price) </code></pre> <p>We want to look through the records to find suspect duplicates on the following example criteria. These get increasingly harder.</p> <ol> <li>Records that have the same product_id, sale_date, and quantity but different customer_id's should be marked as suspect duplicates for review</li> <li>Records that have the same customer_id, product_id, quantity and have sale_dates within five days of each other should be marked as suspect duplicates for review</li> <li>Records that have the same customer_id, product_id, but different quantities within 20 units, and sales dates within five days of each other should be considered suspect.</li> </ol> <p>Is it possible to satisfy each one of these criteria with a single SQL Query that uses JOINS? Is this the most efficient way to do this?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload