Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I <strike>can</strike> could also reproduce this 100% of the time on my machine. (see note at end)</p> <p>The gist of the problem is that you are taking out <code>S</code> locks on system table rows in <code>tempdb</code> that can conflict with the locks needed for internal <code>tempdb</code> cleanup transactions.</p> <p>When this clean up work is allocated to the same session that owns the <code>S</code> lock an indefinite hang can occur.</p> <p>To avoid this problem for certain you need to stop referencing the <code>system</code> objects inside <code>tempdb</code>.</p> <p>It is possible to create a numbers table without referencing any external tables at all. The following needs to read no base table rows and thus also takes no locks.</p> <pre><code>WITH Ten(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ) SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number INTO Numbers FROM Ten T10, Ten T100, Ten T1000, Ten T10000, Ten T100000, Ten T1000000 </code></pre> <h3>Steps to Reproduce</h3> <p>First create a procedure </p> <pre><code>CREATE PROC P AS SET NOCOUNT ON; DECLARE @T TABLE (X INT) GO </code></pre> <p>Then restart the SQL Service and in one connection execute</p> <pre><code>WHILE NOT EXISTS(SELECT * FROM sys.dm_os_waiting_tasks WHERE session_id = blocking_session_id) BEGIN /*This will cause the problematic droptemp transactions*/ EXEC sp_recompile 'P' EXEC P END; SELECT * FROM sys.dm_os_waiting_tasks WHERE session_id = blocking_session_id </code></pre> <p>Then in another connection run</p> <pre><code>USE tempdb; SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number INTO #T FROM sys.objects s1 CROSS JOIN sys.objects s2 CROSS JOIN sys.objects s3 CROSS JOIN sys.objects s4; DROP TABLE #T </code></pre> <p>The query populating the Numbers table seems to manage to get into a live lock situation with the internal system transactions that clean up temporary objects such as table variables.</p> <p>I managed to get session id 53 blocked in this manner. It is blocked indefinitely. The output of <code>sp_WhoIsActive</code> shows that this spid spends almost all of the time suspended. In consecutive runs the numbers in the <code>reads</code> column increases but the values in the other columns remain largely the same.</p> <p>The wait duration doesn't show an increasing pattern though indicating that it must get unblocked periodically before getting blocked again.</p> <pre><code>SELECT * FROM sys.dm_os_waiting_tasks WHERE session_id = blocking_session_id </code></pre> <p>Returns</p> <pre><code>+----------------------+------------+-----------------+------------------+-----------+--------------------+-----------------------+---------------------+--------------------------+--------------------------------------------------------------------------------------------------+ | waiting_task_address | session_id | exec_context_id | wait_duration_ms | wait_type | resource_address | blocking_task_address | blocking_session_id | blocking_exec_context_id | resource_description | +----------------------+------------+-----------------+------------------+-----------+--------------------+-----------------------+---------------------+--------------------------+--------------------------------------------------------------------------------------------------+ | 0x00000002F2C170C8 | 53 | 0 | 86 | LCK_M_X | 0x00000002F9B13040 | 0x00000002F2C170C8 | 53 | NULL | keylock hobtid=281474978938880 dbid=2 id=lock2f9ac8880 mode=U associatedObjectId=281474978938880 | +----------------------+------------+-----------------+------------------+-----------+--------------------+-----------------------+---------------------+--------------------------+--------------------------------------------------------------------------------------------------+ </code></pre> <p>Using the id in the resource description </p> <pre><code>SELECT o.name FROM sys.allocation_units au WITH (NOLOCK) INNER JOIN sys.partitions p WITH (NOLOCK) ON au.container_id = p.partition_id INNER JOIN sys.all_objects o WITH (NOLOCK) ON o.object_id = p.object_id WHERE allocation_unit_id = 281474978938880 </code></pre> <p>Returns</p> <pre><code>+------------+ | name | +------------+ | sysschobjs | +------------+ </code></pre> <p>Running</p> <pre><code>SELECT resource_description,request_status FROM sys.dm_tran_locks WHERE request_session_id = 53 AND request_status &lt;&gt; 'GRANT' </code></pre> <p>Returns</p> <pre><code>+----------------------+----------------+ | resource_description | request_status | +----------------------+----------------+ | (246708db8c1f) | CONVERT | +----------------------+----------------+ </code></pre> <p>Connecting via the DAC and running</p> <pre><code>SELECT id,name FROM tempdb.sys.sysschobjs WITH (NOLOCK) WHERE %%LOCKRES%% = '(246708db8c1f)' </code></pre> <p>Returns</p> <pre><code>+-------------+-----------+ | id | name | +-------------+-----------+ | -1578606288 | #A1E86130 | +-------------+-----------+ </code></pre> <p>Curious about what that is</p> <pre><code>SELECT name,user_type_id FROM tempdb.sys.columns WHERE object_id = -1578606288 </code></pre> <p>Returns</p> <pre><code>+------+--------------+ | name | user_type_id | +------+--------------+ | X | 56 | +------+--------------+ </code></pre> <p>This is the column name in the table variable used by the stored proc.</p> <p>Running</p> <pre><code>SELECT request_mode, request_status, request_session_id, request_owner_id, lock_owner_address, t.transaction_id, t.name, t.transaction_begin_time FROM sys.dm_tran_locks l JOIN sys.dm_tran_active_transactions t ON l.request_owner_id = t.transaction_id WHERE resource_description = '(246708db8c1f)' </code></pre> <p>Returns</p> <pre><code>+--------------+----------------+--------------------+------------------+--------------------+----------------+-------------+-------------------------+ | request_mode | request_status | request_session_id | request_owner_id | lock_owner_address | transaction_id | name | transaction_begin_time | +--------------+----------------+--------------------+------------------+--------------------+----------------+-------------+-------------------------+ | U | GRANT | 53 | 227647 | 0x00000002F1EF6800 | 227647 | droptemp | 2013-11-24 18:36:28.267 | | S | GRANT | 53 | 191790 | 0x00000002F9B16380 | 191790 | SELECT INTO | 2013-11-24 18:21:30.083 | | X | CONVERT | 53 | 227647 | 0x00000002F9B12FC0 | 227647 | droptemp | 2013-11-24 18:36:28.267 | +--------------+----------------+--------------------+------------------+--------------------+----------------+-------------+-------------------------+ </code></pre> <p>So the <code>SELECT INTO</code> transaction is holding an <code>S</code> lock on the row in <code>tempdb.sys.sysschobjs</code> pertaining to the table variable <code>#A1E86130</code>. The <code>droptemp</code> transaction cannot get an <code>X</code> lock on this row because of this conflicting <code>S</code> lock.</p> <p>Running this query repeatedly reveals that the <code>transaction_id</code> for the <code>droptemp</code> transaction repeatedly changes.</p> <p>I speculate that SQL Server must allocate these internal transactions on user spids and prioritise them before doing the user work. So the session id 53 is stuck in a constant cycle where it starts up a <code>droptemp</code> transaction, is blocked by the user transaction running on the same spid. Rolls back the internal transaction then repeats the process indefinitely.</p> <p>This is borne out by tracing the various locking and transaction events in SQL Server Profiler after the spid becomes hung.</p> <p><img src="https://i.stack.imgur.com/q58m1.png" alt="Profiler"></p> <p>I also traced the locking events prior to that. </p> <h3>Lock Events Blocking</h3> <p><img src="https://i.stack.imgur.com/V4znI.png" alt="LockAquisitionPatternBlockingTransaction"></p> <p>Most of the shared key locks taken out by the <code>SELECT INTO</code> transaction on keys in <code>sysschobjs</code> get released immediately. The exception is the first lock on <code>(246708db8c1f)</code>.</p> <p>This makes some sense as the plan shows nested loops scans of <code>[sys].[sysschobjs].[clst] [o]</code> and because temporary objects get given negative objectids they will be the first rows encountered in scan order.</p> <p>I also encountered the situation described in the OP where running a three way cross join first seems to allow the four way one to succeed.</p> <p>The first few events in the trace for the <code>SELECT INTO</code> transaction there are an entirely different pattern.</p> <p><img src="https://i.stack.imgur.com/Xs3qQ.png" alt="LockAquisitionPatternNonBlockingTransaction"></p> <p>This was after a service restart so the lock resource values in the text data column are not directly comparable.</p> <p>Instead of retaining the lock on the first key and then a pattern of acquiring and releasing subsequent keys it seems to acquire a lot more locks without releasing them initially.</p> <p>I presume there must be some variance in execution strategy that avoids the issue.</p> <hr> <p>Update </p> <p><a href="https://connect.microsoft.com/SQLServer/feedback/details/809544/internal-droptemp-transaction-can-cause-unkillable-spid">The Connect Item I raised about this</a> has not been marked as fixed but I am now on SQL Server 2012 SP2 and can now only reproduce temporary self blocking rather than permanent. I still get the self blocking but after some amount of failed attempts to execute the <code>droptemp</code> transaction successfully it appears to go back to processing the user transaction. After that commits the system transaction then is executed successfully. Still on the same spid. (eight attempts in one example run. I'm not sure if this will be consistently repeated) </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload