Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I always default to <code>NOT EXISTS</code>.</p> <p>The execution plans may be the same at the moment but if either column is altered in the future to allow <code>NULL</code>s the <code>NOT IN</code> version will need to do more work (even if no <code>NULL</code>s are actually present in the data) and the semantics of <code>NOT IN</code> if <code>NULL</code>s <em>are</em> present are unlikely to be the ones you want anyway.</p> <p>When neither <code>Products.ProductID</code> or <code>[Order Details].ProductID</code> allow <code>NULL</code>s the <code>NOT IN</code> will be treated identically to the following query.</p> <pre><code>SELECT ProductID, ProductName FROM Products p WHERE NOT EXISTS (SELECT * FROM [Order Details] od WHERE p.ProductId = od.ProductId) </code></pre> <p>The exact plan may vary but for my example data I get the following.</p> <p><img src="https://i.stack.imgur.com/lCTsG.png" alt="Neither NULL"></p> <p>A reasonably common misconception seems to be that correlated sub queries are always "bad" compared to joins. They certainly can be when they force a nested loops plan (sub query evaluated row by row) but this plan includes an anti semi join logical operator. Anti semi joins are not restricted to nested loops but can use hash or merge (as in this example) joins too.</p> <pre><code>/*Not valid syntax but better reflects the plan*/ SELECT p.ProductID, p.ProductName FROM Products p LEFT ANTI SEMI JOIN [Order Details] od ON p.ProductId = od.ProductId </code></pre> <p>If <code>[Order Details].ProductID</code> is <code>NULL</code>-able the query then becomes</p> <pre><code>SELECT ProductID, ProductName FROM Products p WHERE NOT EXISTS (SELECT * FROM [Order Details] od WHERE p.ProductId = od.ProductId) AND NOT EXISTS (SELECT * FROM [Order Details] WHERE ProductId IS NULL) </code></pre> <p>The reason for this is that the correct semantics if <code>[Order Details]</code> contains any <code>NULL</code> <code>ProductId</code>s is to return no results. See the extra anti semi join and row count spool to verify this that is added to the plan.</p> <p><img src="https://i.stack.imgur.com/mPYhd.png" alt="One NULL"></p> <p>If <code>Products.ProductID</code> is also changed to become <code>NULL</code>-able the query then becomes</p> <pre><code>SELECT ProductID, ProductName FROM Products p WHERE NOT EXISTS (SELECT * FROM [Order Details] od WHERE p.ProductId = od.ProductId) AND NOT EXISTS (SELECT * FROM [Order Details] WHERE ProductId IS NULL) AND NOT EXISTS (SELECT * FROM (SELECT TOP 1 * FROM [Order Details]) S WHERE p.ProductID IS NULL) </code></pre> <p>The reason for that one is because a <code>NULL</code> <code>Products.ProductId</code> should not be returned in the results <strong>except</strong> if the <code>NOT IN</code> sub query were to return no results at all (i.e. the <code>[Order Details]</code> table is empty). In which case it should. In the plan for my sample data this is implemented by adding another anti semi join as below.</p> <p><img src="https://i.stack.imgur.com/8XAh1.png" alt="Both NULL"></p> <p>The effect of this is shown in <a href="http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/" rel="noreferrer">the blog post already linked by Buckley</a>. In the example there the number of logical reads increase from around 400 to 500,000.</p> <p>Additionally the fact that a single <code>NULL</code> can reduce the row count to zero makes cardinality estimation very difficult. If SQL Server assumes that this will happen but in fact there were no <code>NULL</code> rows in the data the rest of the execution plan may be catastrophically worse, if this is just part of a larger query, <a href="https://dba.stackexchange.com/q/117306/3690">with inappropriate nested loops causing repeated execution of an expensive sub tree for example</a>. </p> <p>This is not the only possible execution plan for a <code>NOT IN</code> on a <code>NULL</code>-able column however. <a href="http://bradsruminations.blogspot.co.uk/2011/10/t-sql-tuesday-023-flip-side-of-join.html" rel="noreferrer">This article shows another one</a> for a query against the <code>AdventureWorks2008</code> database.</p> <p>For the <code>NOT IN</code> on a <code>NOT NULL</code> column or the <code>NOT EXISTS</code> against either a nullable or non nullable column it gives the following plan.</p> <p><img src="https://i.stack.imgur.com/nahUD.png" alt="Not EXists"></p> <p>When the column changes to <code>NULL</code>-able the <code>NOT IN</code> plan now looks like</p> <p><img src="https://i.stack.imgur.com/8o9PQ.png" alt="Not In - Null"></p> <p>It adds an extra inner join operator to the plan. This apparatus is <a href="https://dba.stackexchange.com/a/14812/3690">explained here</a>. It is all there to convert the previous single correlated index seek on <code>Sales.SalesOrderDetail.ProductID = &lt;correlated_product_id&gt;</code> to two seeks per outer row. The additional one is on <code>WHERE Sales.SalesOrderDetail.ProductID IS NULL</code>. </p> <p>As this is under an anti semi join if that one returns any rows the second seek will not occur. However if <code>Sales.SalesOrderDetail</code> does not contain any <code>NULL</code> <code>ProductID</code>s it will double the number of seek operations required.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload