Note that there are some explanatory texts on larger screens.

plurals
  1. POSQL Server: Why does comparison null=value return true for NOT IN?
    primarykey
    data
    text
    <p>Why does the comparison of <code>value</code> to <code>null</code> return false, except when using a <code>NOT IN</code>, where it returns true?</p> <hr> <p>Given a query to find all stackoverflow users who <strong>have</strong> a post:</p> <pre><code>SELECT * FROM Users WHERE UserID IN (SELECT UserID FROM Posts) </code></pre> <p>This works as expected; i get a list of all users who have a post.</p> <p>Now query for the inverse; find all stackoverflow users who <strong>don't</strong> have a post:</p> <pre><code>SELECT * FROM Users WHERE UserID NOT IN (SELECT UserID FROM Posts) </code></pre> <p>This returns no records, which is incorrect.</p> <p>Given hypothetical data<sup>1</sup></p> <pre><code>Users Posts ================ =============================== UserID Username PostID UserID Subject ------ -------- ------- ------ ---------------- 1 atkins 1 1 Welcome to stack ov... 2 joels 2 2 Welcome all! ... ... ... ... 399573 gt6989b ... ... ... ... ... ... 10592 null (deleted by nsl&amp;fbi... ... ... </code></pre> <p>And assume the rules of NULLs:</p> <ul> <li><code>NULL = NULL</code> evaluates to unknown</li> <li><code>NULL &lt;&gt; NULL</code> evaluates to unknown</li> <li><code>value = NULL</code> evaluates unknown</li> </ul> <p>If we look at the 2nd query, we're interested in finding all rows where the Users.UserID is <strong>not</strong> found in the Posts.UserID column. i would proceed logically as follows:</p> <p>Check UserID 1</p> <ul> <li><code>1 = 1</code> returns true. So we conclude that this user has some posts, and do not include them in the output list</li> </ul> <p>Now check UserID 2:</p> <ul> <li><code>2 = 1</code> returns false, so we keep looking</li> <li><code>2 = 2</code> returns true, so we conclude that this user has some posts, and do not include them in the output list</li> </ul> <p>Now check UserID 399573 </p> <ul> <li><code>399573 = 1</code> returns false, so we keep looking</li> <li><code>399573 = 2</code> returns false, so we keep looking</li> <li>...</li> <li><code>399573 = null</code> returns unknown, so we keep looking</li> <li>...</li> </ul> <p>We found no posts by UserID 399573, so we would include him in the output list.</p> <p>Except SQL Server doesn't do this. If you have a NULL in your <code>in</code> list, then suddenly it finds a match. <em>It suddenly finds a match.</em> Suddenly <code>399573 = null</code> evaluates to true.</p> <p>Why does the comparison of <code>value</code> to <code>null</code> return unknown, except when it returns true?</p> <p><strong>Edit</strong>: i know that i can workaround this <strike>nonsensical</strike> behavior by specifically excluding the nulls:</p> <pre><code>SELECT * FROM Users WHERE UserID NOT IN ( SELECT UserID FROM Posts WHERE UserID IS NOT NULL) </code></pre> <p>But i shouldn't have to, as far as i can tell the boolean logic should be fine without it - hence my question.</p> <h2>Footnotes</h2> <ul> <li><sup>1</sup> hypothetical data; if you don't like it: make up your down.</li> <li>celko now has his own tag</li> </ul>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload