Note that there are some explanatory texts on larger screens.

plurals
  1. POCOUNT (DISTINCT column_name) Discrepancy vs. COUNT (column_name) in SQL Server 2008?
    text
    copied!<p>I'm running into a problem that's driving me nuts. When running the query below, I get a count of <i>233,769</i></p> <pre><code> SELECT COUNT(distinct Member_List_Link.UserID) FROM Member_List_Link with (nolock) INNER JOIN MasterMembers with (nolock) ON Member_List_Link.UserID = MasterMembers.UserID WHERE MasterMembers.Active = 1 And Member_List_Link.GroupID = 5 AND MasterMembers.ValidUsers = 1 AND Member_List_Link.Status = 1 </code></pre> <p>But if I run the same query <em>without</em> the distinct keyword, I get a count of <i>233,748</i></p> <pre><code> SELECT COUNT(Member_List_Link.UserID) FROM Member_List_Link with (nolock) INNER JOIN MasterMembers with (nolock) ON Member_List_Link.UserID = MasterMembers.UserID WHERE MasterMembers.Active = 1 And Member_List_Link.GroupID = 5 AND MasterMembers.ValidUsers = 1 AND Member_List_Link.Status = 1 </code></pre> <p>To test, I recreated all the tables and place them into temp tables and ran the queries again:</p> <pre><code> SELECT COUNT(distinct #Temp_Member_List_Link.UserID) FROM #Temp_Member_List_Link with (nolock) INNER JOIN #Temp_MasterMembers with (nolock) ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID WHERE #Temp_MasterMembers.Active = 1 And #Temp_Member_List_Link.GroupID = 5 AND #Temp_MasterMembers.ValidUsers = 1 AND #Temp_Member_List_Link.Status = 1 </code></pre> <p>And without the distinct keyword</p> <pre><code> SELECT COUNT(#Temp_Member_List_Link.UserID) FROM #Temp_Member_List_Link with (nolock) INNER JOIN #Temp_MasterMembers with (nolock) ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID WHERE #Temp_MasterMembers.Active = 1 And #Temp_Member_List_Link.GroupID = 5 AND #Temp_MasterMembers.ValidUsers = 1 AND #Temp_Member_List_Link.Status = 1 </code></pre> <p>On a side note, I recreated the temp tables by simply running (<code>select * from Member_List_Link into #temp...</code>)</p> <p>And now when I check to see the difference between COUNT(column) vs. COUNT(distinct column) with these temp tables, I don't see any!</p> <p>So why is there a discrepancy with the original tables?</p> <p>I'm running SQL Server 2008 (Dev Edition).</p> <p>UPDATE - Including statistics profile</p> <p>PhysicalOp column only for the first query (without distinct)</p> <pre><code>NULL Compute Scalar Stream Aggregate Clustered Index Seek </code></pre> <p>PhysicalOp column only for the first query (with distinct)</p> <pre><code>NULL Compute Scalar Stream Aggregate Parallelism Stream Aggregate Hash Match Hash Match Bitmap Parallelism Index Seek Parallelism Clustered Index Scan </code></pre> <p>Rows and Executes for the 1st query (without distinct)</p> <pre><code>1 1 0 0 1 1 1 1 </code></pre> <p>Rows and Executes for the 2nd query (with distinct)</p> <pre><code>Rows Executes 1 1 0 0 1 1 16 1 16 16 233767 16 233767 16 281901 16 281901 16 281901 16 234787 16 234787 16 </code></pre> <p>Adding OPTION(MAXDOP 1) to the 2nd query (with distinct)</p> <pre><code>Rows Executes 1 1 0 0 1 1 233767 1 233767 1 281901 1 548396 1 </code></pre> <p>And the resulting PhysicalOp</p> <pre><code>NULL Compute Scalar Stream Aggregate Hash Match Hash Match Index Seek Clustered Index Scan </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload