Note that there are some explanatory texts on larger screens.

plurals
  1. POSpeed up sql JOIN
    primarykey
    data
    text
    <p>First of all, some background.</p> <p>We have an order processing system, where staff enter billing data about orders in an app that stores it in a sql server 2000 database. This database isn't the real billing system: it's just a holding location so that the records can be run into a mainframe system via a nightly batch process. </p> <p>This batch process is a canned third party package provided by an outside vendor. Part of what it's supposed to do is provide a report for any records that were rejected. The reject report is worked manually.</p> <p>Unfortunately, it turns out the third party software doesn't catch all the errors. We have separate processes that pull back the data from the mainframe into another table in the database and load the rejected charges into yet another table. </p> <p>An audit process then runs to make sure everything that was originally entered by the staff can be accounted for somewhere. This audit takes the form of an sql query we run, and it looks something like this:</p> <pre><code>SELECT * FROM [StaffEntry] s with (nolock) LEFT JOIN [MainFrame] m with (nolock) ON m.ItemNumber = s.ItemNumber AND m.Customer=s.Customer AND m.CustomerPO = s.CustomerPO -- purchase order AND m.CustPORev = s.CustPORev -- PO revision number LEFT JOIN [Rejected] r with (nolock) ON r.OrderID = s.OrderID WHERE s.EntryDate BETWEEN @StartDate AND @EndDate AND r.OrderID IS NULL AND m.MainFrameOrderID IS NULL </code></pre> <p>That's heavily modified, of course, but I believe the important parts are represented. The problem is that this query is starting to take too long to run, and I'm trying to figure out how to speed it up.</p> <p>I'm pretty sure the problem is the JOIN from the <code>StaffEntry</code> table to the <code>MainFrame</code> table. Since both hold data for every order since the beginning of time (2003 in this system), they tend to be a little large. The <code>OrderID</code> and <code>EntryDate</code> values used in the <code>StaffEntry</code> table are not preserved when imported to the mainframe, which is why that join is a little more complicated. And finally, since I'm looking for records in the <code>MainFrame</code> table that don't exist, after doing the JOIN we have that ugly <code>IS NULL</code> in the where clause.</p> <p>The <code>StaffEntry</code> table is indexed by EntryDate (clustered) and separately on Customer/PO/rev. <code>MainFrame</code> is indexed by customer and the mainframe charge number (clustered, this is needed for other systems) and separately by customer/PO/Rev. <code>Rejected</code> is not indexed at all, but it's small and testing shows it's not the problem. </p> <p>So, I'm wondering if there is another (hopefully faster) way I can express that relationship?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload