StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Since you have two sets of data, ordered by the same value.. have you tried a merge join instead of the nested loop join?</p> <pre><code>SET STATISTICS IO ON SET STATISTICS TIME ON SELECT COUNT(*) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId OPTION (LOOP JOIN) SELECT COUNT(*) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId OPTION (MERGE JOIN) SELECT COUNT(*) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId OPTION (HASH JOIN) </code></pre> <hr> <p>Edit:</p> <p>These explanations are conceptual. SQL Server may be doing more sophisticated operations than my examples show. This conceptual understanding, matched with the measuring of time and logical IO by the SET STATISTICS commands, and examination of query execution plans - form the basis of my query optimizing technique (grown over four years). May it serve you as well as it has me.</p> <p>Setup</p> <ul> <li>Get 5 decks of cards.</li> <li>Take 1 deck and produce a parent data set.</li> <li>Take the other 4 decks and produce the child data set.</li> <li>Order each data set by card value.</li> <li>Let m be the number of cards in the parent data set.</li> <li>Let n be the number of cards in the child data set.</li> </ul> <p>NestedLoop</p> <ul> <li>Take a card off the top of the parent data set.</li> <li>Search (using binary search) within the child data set for the first occurence of a match.</li> <li>Seek forward in the child data set from the first match until a non-match is found. You've now found all the matches.</li> <li>Repeat this for each card in the parent data set.</li> </ul> <p>The nested loop algorithm iterates the parent data set, and then searches the child data set once for each parent, making it cost: m * log(n)</p> <p>Merge</p> <ul> <li>Take a card off the top of the parent data set.</li> <li>Take a card off the top of the child data set.</li> <li>If the cards match, pull cards from the top of each deck until a non-match is found from each. Produce each matching pair between the parent and child matches.</li> <li>If the cards do not match, find the smaller between the parent and child cards, and take a card off the top of that data set.</li> </ul> <p>The merge join algorithm iterates the parent data set once and the child data set once, making it cost: m + n. It relies on the data being ordered. If you ask for a merge join on un-ordered data, you will incur an ordering operation! This brings the cost to (m * log(m)) + (n * log(n)) + m + n. Even that might be better than nested loop in some cases.</p> <p>Hash</p> <ul> <li>Get a card table.</li> <li>Take each card from the parent data set and place it on the card table where you can find it (does not have to be anything to do with card value, just has to be convenient for you).</li> <li>Take each card from the child data set, locate its matching parent on the cardboard table and produce the matching pair.</li> </ul> <p>The hash join algorithm iterates the parent data set once and the child data set once, making it cost: m + n. It relies on having a big enough card table to hold the entire contents of the parent data set.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload