StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
2547877
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2010-03-30T19:00:38.353
FavoriteCount
0
LastActivityDate
2010-04-05T18:12:02.470
LastEditDate
2010-04-05T18:12:02.470
LastEditorUserId
38360
OwnerUserId
38360
ParentId
2519639
PostTypeId
2
Score
34
ViewCount
0
LastEditorDisplayName
text
Body
<blockquote> My question is, is it a problem to have hundreds of thousands of tables in your SQL Server? </blockquote> Yes. It is a huge problem to have this many tables in your SQL Server. Every object has to be tracked by SQL Server as metadata, and once you include indexes, referential constraints, primary keys, defaults, and so on, then you are talking about millions of database objects. While SQL Server may theoretically be able to handle 232 objects, rest assured that it will start buckling under the load much sooner than that. And if the database doesn't collapse, your developers and IT staff almost certainly will. I get nervous when I see more than a thousand tables or so; show me a database with hundreds of thousands and I will run away screaming. Creating hundreds of thousands of tables as a poor-man's partitioning strategy will eliminate your ability to do any of the following: <ul> <li>Write efficient queries (how do you <code>SELECT</code> multiple categories?)</li> <li>Maintain unique identities (as you've already discovered)</li> <li>Maintain referential integrity (unless you like managing 300,000 foreign keys)</li> <li>Perform ranged updates</li> <li>Write clean application code</li> <li>Maintain any sort of history</li> <li>Enforce proper security (it seems evident that users would have to be able to initiate these create/drops - very dangerous)</li> <li>Cache properly - 100,000 tables means 100,000 different execution plans all competing for the same memory, which you likely don't have enough of;</li> <li>Hire a DBA (because rest assured, they will quit as soon as they see your database).</li> </ul> On the other hand, it's not a problem at all to have hundreds of thousands of rows, or even millions of rows, in a single table - that's the way SQL Server and other SQL RDBMSes were designed to be used and they are very well-optimized for this case. <blockquote> The drop in O(1) is extremely desirable to me. Maybe there's a completely different solution I'm not thinking of? </blockquote> The typical solution to performance problems in databases is, in order of preference: <ul> <li>Run a profiler to determine what the slowest parts of the query are;</li> <li>Improve the query, if possible (i.e. by eliminating non-sargable predicates);</li> <li>Normalize or add indexes to eliminate those bottlenecks;</li> <li>Denormalize when necessary (not generally applicable to deletes);</li> <li>If cascade constraints or triggers are involved, disable those for the duration of the transaction and blow out the cascades manually.</li> </ul> But the reality here is that you don't need a "solution." "Millions and millions of rows" is not a lot in a SQL Server database. It is very quick to delete a few thousand rows from a table of millions by simply indexing on the column you wish to delete from - in this case <code>CategoryID</code>. SQL Server can do this without breaking a sweat. In fact, deletions normally have an O(M log N) complexity (N = number of rows, M = number of rows to delete). In order to achieve an O(1) deletion time, you'd be sacrificing almost every benefit that SQL Server provides in the first place. O(M log N) may not be as fast as O(1), but the kind of slowdowns you're talking about (several minutes to delete) must have a secondary cause. The numbers do not add up, and to demonstrate this, I've gone ahead and produced a benchmark: <hr> <h2>Table Schema:</h2> <pre><code>CREATE TABLE Stars ( StarID int NOT NULL IDENTITY(1, 1) CONSTRAINT PK_Stars PRIMARY KEY CLUSTERED, CategoryID smallint NOT NULL, StarName varchar(200) ) CREATE INDEX IX_Stars_Category ON Stars (CategoryID) </code></pre> Note that this schema is not even really optimized for <code>DELETE</code> operations, it's a fairly run-of-the-mill table schema you might see in SQL server. If this table has no relationships, then we don't need the surrogate key or clustered index (or we could put the clustered index on the category). I'll come back to that later. <h2>Sample Data:</h2> This will populate the table with 10 million rows, using 500 categories (i.e. a cardinality of 1:20,000 per category). You can tweak the parameters to change the amount of data and/or cardinality. <pre><code>SET NOCOUNT ON DECLARE @BatchSize int, @BatchNum int, @BatchCount int, @StatusMsg nvarchar(100) SET @BatchSize = 1000 SET @BatchCount = 10000 SET @BatchNum = 1 WHILE (@BatchNum <= @BatchCount) BEGIN SET @StatusMsg = N'Inserting rows - batch #' + CAST(@BatchNum AS nvarchar(5)) RAISERROR(@StatusMsg, 0, 1) WITH NOWAIT INSERT Stars2 (CategoryID, StarName) SELECT v.number % 500, CAST(RAND() * v.number AS varchar(200)) FROM master.dbo.spt_values v WHERE v.type = 'P' AND v.number >= 1 AND v.number <= @BatchSize SET @BatchNum = @BatchNum + 1 END </code></pre> <h2>Profile Script</h2> The simplest of them all... <pre><code>DELETE FROM Stars WHERE CategoryID = 50 </code></pre> <h2>Results:</h2> This was tested on an 5-year old workstation machine running, IIRC, a 32-bit dual-core AMD Athlon and a cheap 7200 RPM SATA drive. I ran the test 10 times using different CategoryIDs. The slowest time (cold cache) was about 5 seconds. The fastest time was 1 second. Perhaps not as fast as simply dropping the table, but nowhere near the multi-minute deletion times you mentioned. And remember, this isn't even on a decent machine! <h2>But we can do better...</h2> Everything about your question implies that this data isn't related. If you don't have relations, you don't need the surrogate key, and can get rid of one of the indexes, moving the clustered index to the <code>CategoryID</code> column. Now, as a rule, clustered indexes on non-unique/non-sequential columns are not a good practice. But we're just benchmarking here, so we'll do it anyway: <pre><code>CREATE TABLE Stars ( CategoryID smallint NOT NULL, StarName varchar(200) ) CREATE CLUSTERED INDEX IX_Stars_Category ON Stars (CategoryID) </code></pre> Run the same test data generator on this (incurring a mind-boggling number of page splits) and the same deletion took an average of just 62 milliseconds, and 190 from a cold cache (outlier). And for reference, if the index is made nonclustered (no clustered index at all) then the delete time only goes up to an average of 606 ms. <h2>Conclusion:</h2> If you're seeing delete times of several minutes - or even several seconds then something is very, very wrong. Possible factors are: <ul> <li>Statistics aren't up to date (shouldn't be an issue here, but if it is, just run <code>sp_updatestats</code>);</li> <li>Lack of indexing (although, curiously, removing the <code>IX_Stars_Category</code> index in the first example actually leads to a faster overall delete, because the clustered index scan is faster than the nonclustered index delete);</li> <li>Improperly-chosen data types. If you only have millions of rows, as opposed to billions, then you do not need a <code>bigint</code> on the <code>StarID</code>. You definitely don't need it on the <code>CategoryID</code> - if you have fewer than 32,768 categories then you can even do with a <code>smallint</code>. Every byte of unnecessary data in each row adds an I/O cost.</li> <li>Lock contention. Maybe the problem isn't actually delete speed at all; maybe some other script or process is holding locks on <code>Star</code> rows and the <code>DELETE</code> just sits around waiting for them to let go.</li> <li>Extremely poor hardware. I was able to run this without any problems on a pretty lousy machine, but if you're running this database on a '90s-era Presario or some similar machine that's preposterously unsuitable for hosting an instance of SQL Server, and it's heavily-loaded, then you're obviously going to run into problems.</li> <li>Very expensive foreign keys, triggers, constraints, or other database objects which you haven't included in your example, which might be adding a high cost. Your execution plan should clearly show this (in the optimized example above, it's just a single Clustered Index Delete).</li> </ul> I honestly cannot think of any other possibilities. Deletes in SQL Server just aren't that slow. <hr> If you're able to run these benchmarks and see roughly the same performance I saw (or better), then it means the problem is with your database design and optimization strategy, not with SQL Server or the asymptotic complexity of deletions. I would suggest, as a starting point, to read a little about optimization: <ul> <li><a href="http://www.databasejournal.com/features/mssql/article.php/1576231/SQL-Server-Optimization-Tips-for-Designing-Tables.htm" rel="noreferrer">SQL Server Optimization Tips</a> (Database Journal)</li> <li><a href="http://msdn.microsoft.com/en-us/library/aa964133(SQL.90).aspx" rel="noreferrer">SQL Server Optimization</a> (MSDN)</li> <li><a href="http://msdn.microsoft.com/en-us/library/ms998577.aspx" rel="noreferrer">Improving SQL Server Performance</a> (MSDN)</li> <li><a href="http://blogs.msdn.com/sqlqueryprocessing/default.aspx" rel="noreferrer">SQL Server Query Processing Team Blog</a></li> <li><a href="http://www.sql-server-performance.com/" rel="noreferrer">SQL Server Performance</a> (particularly their tips on <a href="http://www.sql-server-performance.com/tips/optimizing_indexes_general_p1.aspx" rel="noreferrer">indexes</a>)</li> </ul> If this still doesn't help you, then I can offer the following additional suggestions: <ul> <li>Upgrade to SQL Server 2008, which gives you a myriad of <a href="http://msdn.microsoft.com/en-us/library/dd894051.aspx" rel="noreferrer">compression options</a> that can vastly improve I/O performance;</li> <li>Consider pre-compressing the per-category <code>Star</code> data into a compact serialized list (using the <code>BinaryWriter</code> class in .NET), and store it in a <code>varbinary</code> column. This way you can have one row per category. This violates 1NF rules, but since you don't seem to be doing anything with individual <code>Star</code> data from within the database anyway anyway, I doubt you'd be losing much.</li> <li>Consider using a non-relational database or storage format, such as <a href="http://www.db4o.com/" rel="noreferrer">db4o</a> or <a href="http://cassandra.apache.org/" rel="noreferrer">Cassandra</a>. Instead of implementing a known database anti-pattern (the infamous "data dump"), use a tool that is actually designed for that kind of storage and access pattern.</li> </ul>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POSQL Server Efficiently dropping a group of rows with millions and millions of rows
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USAaronaught
UserOwnerUserId
1. USAaronaught
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POSQL Server Efficiently dropping a group of rows with millions and millions of rows
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.