StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p><strong>EDIT:</strong> </p> <p>I realized, in the course of attempting to explain the query, that it wouldn't always work correctly. So, I went back and figured out how to test this. I'm still bugged by the schema setup - namely, it implies that new users can't be added to an existing thread, and that a specific set of users will only be able to talk in one thread - but it was good to correct the query.</p> <pre><code>WITH Selected_Users(id) as (VALUES (@id1), (@id2), --etc--), Threads(id) as (SELECT DISTINCT threadFk FROM ThreadMembers as a JOIN Selected_Users as b ON b.id = a.userFk) SELECT a.id FROM Threads as a WHERE NOT EXISTS (SELECT '1' FROM ThreadMembers as b LEFT JOIN Selected_Users as c ON c.id = b.userFk WHERE c.id IS NULL AND b.threadFk = a.id) AND NOT EXISTS (SELECT '1' FROM Selected_Users as b LEFT JOIN ThreadMembers as c ON c.userFk = b.id AND c.threadFk = a.id WHERE c.userFk IS NULL) </code></pre> <p>The statement will likely have to be dynamic, to build the list of selected users, unless SQL Server has a way to provide a list as a host variable (I know DB2 does, at least from the iSeries). I don't have the perfect dataset to test this against, but against a multi-million row table (with only a many-one relationship), it returns almost instantly - I'm getting index-only access for this (hint hint).</p> <p>Explanations: </p> <pre><code>WITH Selected_Users(id) as (VALUES (@id1), (@id2), --etc--), </code></pre> <p>This CTE is building the list of users so that it can be referenced as a table. This makes it easiest to deal with, although it would be possible to simply replaces it with an <code>IN</code> statement everywhere (requires multiple references, though).</p> <pre><code> Threads(id) as (SELECT DISTINCT threadFk FROM ThreadMembers as a JOIN Selected_Users as b ON b.id = a.userFk) </code></pre> <p>This CTE gets the list of (distinct) threads that the users are involved in. Mostly, this is just to chop the listing down to single references to <code>threadFk</code>.</p> <pre><code>SELECT a.id FROM Threads as a </code></pre> <p>... Get the selected set of threads ...</p> <pre><code>WHERE NOT EXISTS (SELECT '1' FROM ThreadMembers as b LEFT JOIN Selected_Users as c ON c.id = b.userFk WHERE c.id IS NULL AND b.threadFk = a.id) </code></pre> <p>Where there isn't anybody 'missing' from the selected list of users - that is, it eliminates threads with user-lists that are subsets of a larger one. It also eliminates threads that have some of the users listed from the selection, but also a few that aren't, meaning that the <em>counts</em> of the users would match, but the actual users would not (this is where my first version failed).</p> <p><hr /> <strong>EDIT:</strong> </p> <p>I realized that, while the existing statement takes care of the situation where the provided list of users is a subset of users listed for a given thread, I didn't take care of the situation where the list of selected users contains a subset that is the list of users for the given thread.</p> <pre><code>AND NOT EXISTS (SELECT '1' FROM Selected_Users as b LEFT JOIN ThreadMembers as c ON c.userFk = b.id AND c.threadFk = a.id WHERE c.userFk IS NULL) </code></pre> <p>This clause fixes that. It makes sure that there aren't any leftover users in the selection list, after excluding users for a particular thread.</p> <p>The statement is now bugging me a bit - there may be a slightly better way for me to do this...</p> <p><hr /> <strong>EDIT:</strong></p> <p>Muwahaha, there <em>is</em> a <code>COUNT(*)</code> version, which should also be faster:</p> <pre><code>WITH Selected_Users(id) as (VALUES (@id1), (@id2), --etc--), SELECT a.threadFk FROM ThreadMembers as a JOIN Selected_Users as b ON b.id = a.userFk GROUP BY a.threadFk HAVING COUNT(*) = (SELECT COUNT(*) FROM Selected_Users) AND COUNT(*) = (SELECT COUNT(*) from ThreadMembers as c WHERE c.threadFk = a.threadFk) </code></pre> <p>Explanations: </p> <pre><code>SELECT a.threadFk FROM ThreadMembers as a JOIN Selected_Users as b ON b.id = a.userFk </code></pre> <p>This is joining to get all threads the listed members are a part of. This is the inside equivalent to the <code>Threads</code> CTE above. Actually, you could remove that CTE in the above query, too.</p> <pre><code>GROUP BY a.threadFk </code></pre> <p>We only want one instance of a given thread after all. Also (in DB2 at least), the rest of the statement isn't valid unless it's present.</p> <pre><code>HAVING COUNT(*) = (SELECT COUNT(*) FROM Selected_Users) </code></pre> <p>Verify that, for the given thread, all of the selected users are present. Or, all of the selected users must be present in the given thread.</p> <pre><code>AND COUNT(*) = (SELECT COUNT(*) from ThreadMembers as c WHERE c.threadFk = a.threadFk) </code></pre> <p>Verify that, for the given thread, there are no non-selected users. Or, there must not be any users 'left out'</p> <p>You <em>should</em> get index-only access for this (I seem to be). The <code>COUNT(*)</code> of the result rows (for the <code>GROUP BY</code>) should only be performed once, and reused. The <code>HAVING</code> clause is evaluated <em>after</em> the <code>GROUP BY</code> takes place (if I recall correctly), so the sub-select for the count from the original table should only take place <em>once</em> per <code>threadFk</code>. </p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload