StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I'm not completely sure, but I get the impression that this question is really about upsert, which is the following atomic operation:</p> <ul> <li>If the row exists in both the source and target, <code>UPDATE</code> the target;</li> <li>If the row only exists in the source, <code>INSERT</code> the row into the target;</li> <li>(Optionally) If the row exists in the target but <em>not</em> the source, <code>DELETE</code> the row from the target.</li> </ul> <p>Developers-turned-DBAs often naïvely write it row-by-row, like this:</p> <pre><code>-- For each row in source IF EXISTS(<target_expression>) IF @delete_flag = 1 DELETE <target_expression> ELSE UPDATE target SET <target_columns> = <source_values> WHERE <target_expression> ELSE INSERT target (<target_columns>) VALUES (<source_values>) </code></pre> <p>This is just about the worst thing you can do, for several reasons:</p> <ul> <li><p>It has a race condition. The row can disappear between <code>IF EXISTS</code> and the subsequent <code>DELETE</code> or <code>UPDATE</code>.</p></li> <li><p>It's wasteful. For every transaction you have an extra operation being performed; maybe it's trivial, but that depends entirely on how well you've indexed.</p></li> <li><p>Worst of all - it's following an iterative model, thinking about these problems at the level of a single row. This will have the largest (worst) impact of all on overall performance.</p></li> </ul> <p>One very minor (and I emphasize minor) optimization is to just attempt the <code>UPDATE</code> anyway; if the row doesn't exist, <code>@@ROWCOUNT</code> will be 0 and you can then "safely" insert:</p> <pre><code>-- For each row in source BEGIN TRAN UPDATE target SET <target_columns> = <source_values> WHERE <target_expression> IF (@@ROWCOUNT = 0) INSERT target (<target_columns>) VALUES (<source_values>) COMMIT </code></pre> <p>Worst-case, this will still perform two operations for every transaction, but at least there's a <em>chance</em> of only performing one, and it also eliminates the race condition (kind of).</p> <p>But the real issue is that this is still being done for each row in the source.</p> <p>Before SQL Server 2008, you had to use an awkward 3-stage model to deal with this at the set level (still better than row-by-row):</p> <pre><code>BEGIN TRAN INSERT target (<target_columns>) SELECT <source_columns> FROM source s WHERE s.id NOT IN (SELECT id FROM target) UPDATE t SET <target_columns> = <source_columns> FROM target t INNER JOIN source s ON t.d = s.id DELETE t FROM target t WHERE t.id NOT IN (SELECT id FROM source) COMMIT </code></pre> <p>As I said, performance was pretty lousy on this, but still a lot better than the one-row-at-a-time approach. SQL Server 2008, however, finally introduced <a href="http://technet.microsoft.com/en-us/library/bb510625.aspx" rel="noreferrer">MERGE</a> syntax, so now all you have to do is this:</p> <pre><code>MERGE target USING source ON target.id = source.id WHEN MATCHED THEN UPDATE <target_columns> = <source_columns> WHEN NOT MATCHED THEN INSERT (<target_columns>) VALUES (<source_columns>) WHEN NOT MATCHED BY SOURCE THEN DELETE; </code></pre> <p>That's it. One statement. If you're using SQL Server 2008 and need to perform any sequence of <code>INSERT</code>, <code>UPDATE</code> and <code>DELETE</code> depending on whether or not the row already exists - <strong>even if it's just one row</strong> - there is <strong>no</strong> excuse not to be using <code>MERGE</code>.</p> <p>You can even <code>OUTPUT</code> the rows affected by a <code>MERGE</code> into a table variable if you need to find out afterward what was done. Simple, fast, and risk-free. Do it.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload