Note that there are some explanatory texts on larger screens.

plurals
  1. POOne 400GB table, One query - Need Tuning Ideas (SQL2005)
    text
    copied!<p>I have a single large table which I would like to optimize. I'm using MS-SQL 2005 server. I'll try to describe how it is used and if anyone has any suggestions I would appreciate it very much.</p> <p>The table is about 400GB, has 100 million rows and 1 million rows are inserted each day. The table has 8 columns, 1 data col and 7 columns used for lookups/ordering. </p> <pre><code> k1 k2 k3 k4 k5 k6 k7 d1 </code></pre> <p>where</p> <pre><code> k1: varchar(3), primary key - clustered index, 10 possible values k2: bigint, primary key - clustered index, total rows/10 possible values k3: int, 10 possible values k4: money, 100 possible values k5: bool k6: bool k7: DateTime </code></pre> <p>Only one select query is run which looks like this:</p> <pre><code> SELECT TOP(g) d1 FROM table WITH(NOLOCK) WHERE k1 = a AND k3 = c AND k4 = d AND k5 = e AND k6 = f ORDER BY k7 </code></pre> <p>where g = circa 1 million This query us ran about 10 times per day (often while inserts are happening) and takes about 5-30 minutes.</p> <p>So I currently only have a clustered index on the two primary key columns. My question is: what indexes should I add to improve this query's performance?</p> <p>Would separate indexes on every column be a good choice? I think a single index would take up about 5-8GB. The DB server has 8GB RAM total.</p> <p>Please do not say that the best thing is to experiment. This is akin to 'I don't know, work it out your self' :)</p> <p>Any tips much appreciated!</p> <p><hr> EDIT by doofledorfer--</p> <p>You've caused an outbreak of premature optimization here, if not outright suggestions that "the best thing is to experiment". You need to clarify a number of issues if you want useful help.</p> <p>-- doofledorfer <hr> EDIT: Comments on posts to date are now posted below along with query plan - Mr. Flibble</p> <hr> <blockquote> <p>You are probably I/O bound</p> </blockquote> <p>Yes, it is not CPU bound. Disk access is high. All available RAM seems to be used. Whether it is used wisely or not remains to be seen.</p> <blockquote> <p>You say you can't split the data because all the data is used: IMPOSSIBLE</p> </blockquote> <p>I mean that all data is used at some point - not that all data is used by each user in each query. I can certainly split the data but, so far, I don't understand why partitioning the table is any better than using a clustered index.</p> <blockquote> <p>Why did you choose these types VARCHAR probably should have been INT as it can only be a few values. The rest are sensible enough, Money represents a money value in real life and bigint is an ID, and the bools are onny, offy type things :)</p> <p>By any chance we could get have a look the insert statement, or TSQL or the bulkinsert </p> </blockquote> <p>TSQL. Its basically INSERT INTO table VALUES (k1,k2,k3,k4,k5,k6,d1). The only thing that is in any way interesting is that many duplicate inserts are attempted and the k1 &amp; k2 PK constraint is used to prevent duplicate data entering the database. I believed at design time (and now) that this was as quick a way as any to finter out duplicate data. </p> <blockquote> <p>Can you tell how often your insert happens Every 10 minutes or so inserts run (ADO.NET) maybe 10K at a time and take a few minutes. I estimate currently a full day's inserts take 40% of the time in the day. </p> <p>Does the DateTime field contains the date of insert No. There is actually another DateTime column which does but it is not retrieved in any SELECT query so I didn't mention it for the sake of simplicity.</p> <p>How did you came to this More one man day thinking. </p> <p>if you're interested only in the last data, deleting/archiving the useless data could make sense (start from scratch every morning)</p> </blockquote> <p>I am not interested in recent data only. A query may select some of the very first data that was inserted into the table all the way up to data inserted minutes ago. But as the data is filtered this does not mean that all the data in the DB is requested in that query.</p> <blockquote> <p>if there is only one "inserter" and only one "reader", you may want to switch to a specialised type (hashmap/list/deque/stack) or something more elaborated, in a programming language.</p> </blockquote> <p>I will probably stick with MSSQL for the moment. It's not broke yet, just a little slow.</p> <p>liggett78, do you suggest a clustered index on columns k1,k4,k5,k6,k3 or a non-clustered index on those columns?</p> <hr> <p>My main question right now is should I extend the current clustered index to contain k4 also (this is the col with next most possible values) or should I just add a non-clustered index to k4.</p> <p>Would adding all k1-k6 to a clustered index be an option? Then have a separate non-clustered index on the DateTime column for the ORDER BY? Am I correct in thinking that this would not cause any major increase in DB size but will only affect insert times. Can anyone guesstimate the effect this will have on inserts?</p> <p>I think that if adding indexes to all the columns will double the DB size then it is not viable without large (ie. hardware) changes. </p> <hr> <p>The following plan was run with an index (non clustered) on the DATE column.</p> <p>EDIT: Not sure if you can see the XML below so here is a link to it: <a href="http://conormccarthy.com/box/queryplan.sqlplan.txt" rel="nofollow noreferrer">http://conormccarthy.com/box/queryplan.sqlplan.txt</a></p> <pre><code>&lt;?xml version="1.0" encoding="utf-16"?&gt; &lt;ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.0" Build="9.00.1399.06" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"&gt; &lt;BatchSequence&gt; &lt;Batch&gt; &lt;Statements&gt; &lt;StmtSimple StatementCompId="1" StatementEstRows="11111" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="625.754" StatementText="SELECT TOP(11111) d1 FROM hands WITH (NOLOCK) &amp;#xD;&amp;#xA; WHERE k4 = '10' &amp;#xD;&amp;#xA; AND k6 = 1 &amp;#xD;&amp;#xA; AND k5 = 1 &amp;#xD;&amp;#xA; AND k1 = 'IPN' &amp;#xD;&amp;#xA; AND k3 BETWEEN 2 AND 10 &amp;#xD;&amp;#xA; ORDER BY k7 DESC&amp;#xD;&amp;#xA;&amp;#xD;&amp;#xA;" StatementType="SELECT"&gt; &lt;StatementSetOptions ANSI_NULLS="false" ANSI_PADDING="false" ANSI_WARNINGS="false" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="false" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="false" /&gt; &lt;QueryPlan DegreeOfParallelism="1" CachedPlanSize="36"&gt; &lt;MissingIndexes&gt; &lt;MissingIndexGroup Impact="81.7837"&gt; &lt;MissingIndex Database="[MYDB]" Schema="[dbo]" Table="[Hands]"&gt; &lt;ColumnGroup Usage="EQUALITY"&gt; &lt;Column Name="[k1]" ColumnId="1" /&gt; &lt;Column Name="[k4]" ColumnId="7" /&gt; &lt;Column Name="[k5]" ColumnId="9" /&gt; &lt;Column Name="[k6]" ColumnId="10" /&gt; &lt;/ColumnGroup&gt; &lt;ColumnGroup Usage="INEQUALITY"&gt; &lt;Column Name="[k3]" ColumnId="6" /&gt; &lt;/ColumnGroup&gt; &lt;ColumnGroup Usage="INCLUDE"&gt; &lt;Column Name="[d1]" ColumnId="3" /&gt; &lt;Column Name="[k7]" ColumnId="4" /&gt; &lt;/ColumnGroup&gt; &lt;/MissingIndex&gt; &lt;/MissingIndexGroup&gt; &lt;/MissingIndexes&gt; &lt;RelOp AvgRowSize="75" EstimateCPU="0.0011111" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="11111" LogicalOp="Top" NodeId="0" Parallel="false" PhysicalOp="Top" EstimatedTotalSubtreeCost="625.754"&gt; &lt;OutputList&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="d1" /&gt; &lt;/OutputList&gt; &lt;RunTimeInformation&gt; &lt;RunTimeCountersPerThread Thread="0" ActualRows="11111" ActualEndOfScans="1" ActualExecutions="1" /&gt; &lt;/RunTimeInformation&gt; &lt;Top RowCount="false" IsPercent="false" WithTies="false"&gt; &lt;TopExpression&gt; &lt;ScalarOperator ScalarString="(11111)"&gt; &lt;Const ConstValue="(11111)" /&gt; &lt;/ScalarOperator&gt; &lt;/TopExpression&gt; &lt;RelOp AvgRowSize="83" EstimateCPU="135.557" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="11111" LogicalOp="Filter" NodeId="1" Parallel="false" PhysicalOp="Filter" EstimatedTotalSubtreeCost="625.753"&gt; &lt;OutputList&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="d1" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k7" /&gt; &lt;/OutputList&gt; &lt;RunTimeInformation&gt; &lt;RunTimeCountersPerThread Thread="0" ActualRows="11111" ActualEndOfScans="0" ActualExecutions="1" /&gt; &lt;/RunTimeInformation&gt; &lt;Filter StartupExpression="false"&gt; &lt;RelOp AvgRowSize="96" EstimateCPU="318.331" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="195691" LogicalOp="Inner Join" NodeId="2" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="625.404"&gt; &lt;OutputList&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="d1" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k7" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k3" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k4" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k5" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k6" /&gt; &lt;/OutputList&gt; &lt;RunTimeInformation&gt; &lt;RunTimeCountersPerThread Thread="0" ActualRows="341958" ActualEndOfScans="0" ActualExecutions="1" /&gt; &lt;/RunTimeInformation&gt; &lt;NestedLoops Optimized="false" WithOrderedPrefetch="true"&gt; &lt;OuterReferences&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k1" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="HandId" /&gt; &lt;ColumnReference Column="Expr1003" /&gt; &lt;/OuterReferences&gt; &lt;RelOp AvgRowSize="32" EstimateCPU="330.366" EstimateIO="790.88" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="195691" LogicalOp="Index Scan" NodeId="4" Parallel="false" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="2.88444"&gt; &lt;OutputList&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k1" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="HandId" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k7" /&gt; &lt;/OutputList&gt; &lt;RunTimeInformation&gt; &lt;RunTimeCountersPerThread Thread="0" ActualRows="341958" ActualEndOfScans="0" ActualExecutions="1" /&gt; &lt;/RunTimeInformation&gt; &lt;IndexScan Ordered="true" ScanDirection="BACKWARD" ForcedIndex="false" NoExpandHint="false"&gt; &lt;DefinedValues&gt; &lt;DefinedValue&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k1" /&gt; &lt;/DefinedValue&gt; &lt;DefinedValue&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="HandId" /&gt; &lt;/DefinedValue&gt; &lt;DefinedValue&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k7" /&gt; &lt;/DefinedValue&gt; &lt;/DefinedValues&gt; &lt;Object Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Index="[ix_dateplayed]" /&gt; &lt;Predicate&gt; &lt;ScalarOperator ScalarString="[MYDB].[dbo].[Hands].[k1]=N'IPN'"&gt; &lt;Compare CompareOp="EQ"&gt; &lt;ScalarOperator&gt; &lt;Identifier&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k1" /&gt; &lt;/Identifier&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Const ConstValue="N'IPN'" /&gt; &lt;/ScalarOperator&gt; &lt;/Compare&gt; &lt;/ScalarOperator&gt; &lt;/Predicate&gt; &lt;/IndexScan&gt; &lt;/RelOp&gt; &lt;RelOp AvgRowSize="88" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="195691" EstimateRewinds="0" EstimateRows="1" LogicalOp="Clustered Index Seek" NodeId="6" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="621.331"&gt; &lt;OutputList&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="d1" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k3" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k4" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k5" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k6" /&gt; &lt;/OutputList&gt; &lt;RunTimeInformation&gt; &lt;RunTimeCountersPerThread Thread="0" ActualRows="341958" ActualEndOfScans="0" ActualExecutions="341958" /&gt; &lt;/RunTimeInformation&gt; &lt;IndexScan Lookup="true" Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" NoExpandHint="false"&gt; &lt;DefinedValues&gt; &lt;DefinedValue&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="d1" /&gt; &lt;/DefinedValue&gt; &lt;DefinedValue&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k3" /&gt; &lt;/DefinedValue&gt; &lt;DefinedValue&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k4" /&gt; &lt;/DefinedValue&gt; &lt;DefinedValue&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k5" /&gt; &lt;/DefinedValue&gt; &lt;DefinedValue&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k6" /&gt; &lt;/DefinedValue&gt; &lt;/DefinedValues&gt; &lt;Object Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Index="[PK_Hands]" TableReferenceId="-1" /&gt; &lt;SeekPredicates&gt; &lt;SeekPredicate&gt; &lt;Prefix ScanType="EQ"&gt; &lt;RangeColumns&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k1" /&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="HandId" /&gt; &lt;/RangeColumns&gt; &lt;RangeExpressions&gt; &lt;ScalarOperator ScalarString="[MYDB].[dbo].[Hands].[k1]"&gt; &lt;Identifier&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k1" /&gt; &lt;/Identifier&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator ScalarString="[MYDB].[dbo].[Hands].[HandId]"&gt; &lt;Identifier&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="HandId" /&gt; &lt;/Identifier&gt; &lt;/ScalarOperator&gt; &lt;/RangeExpressions&gt; &lt;/Prefix&gt; &lt;/SeekPredicate&gt; &lt;/SeekPredicates&gt; &lt;/IndexScan&gt; &lt;/RelOp&gt; &lt;/NestedLoops&gt; &lt;/RelOp&gt; &lt;Predicate&gt; &lt;ScalarOperator ScalarString="[MYDB].[dbo].[Hands].[k4]=($10.0000) AND [MYDB].[dbo].[Hands].[k6]=(1) AND [MYDB].[dbo].[Hands].[k5]=(1) AND [MYDB].[dbo].[Hands].[k3]&amp;gt;=(2) AND [MYDB].[dbo].[Hands].[k3]&amp;lt;=(10)"&gt; &lt;Logical Operation="AND"&gt; &lt;ScalarOperator&gt; &lt;Compare CompareOp="EQ"&gt; &lt;ScalarOperator&gt; &lt;Identifier&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k4" /&gt; &lt;/Identifier&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Const ConstValue="($10.0000)" /&gt; &lt;/ScalarOperator&gt; &lt;/Compare&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Compare CompareOp="EQ"&gt; &lt;ScalarOperator&gt; &lt;Identifier&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k6" /&gt; &lt;/Identifier&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Const ConstValue="(1)" /&gt; &lt;/ScalarOperator&gt; &lt;/Compare&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Compare CompareOp="EQ"&gt; &lt;ScalarOperator&gt; &lt;Identifier&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k5" /&gt; &lt;/Identifier&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Const ConstValue="(1)" /&gt; &lt;/ScalarOperator&gt; &lt;/Compare&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Compare CompareOp="GE"&gt; &lt;ScalarOperator&gt; &lt;Identifier&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k3" /&gt; &lt;/Identifier&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Const ConstValue="(2)" /&gt; &lt;/ScalarOperator&gt; &lt;/Compare&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Compare CompareOp="LE"&gt; &lt;ScalarOperator&gt; &lt;Identifier&gt; &lt;ColumnReference Database="[MYDB]" Schema="[dbo]" Table="[Hands]" Column="k3" /&gt; &lt;/Identifier&gt; &lt;/ScalarOperator&gt; &lt;ScalarOperator&gt; &lt;Const ConstValue="(10)" /&gt; &lt;/ScalarOperator&gt; &lt;/Compare&gt; &lt;/ScalarOperator&gt; &lt;/Logical&gt; &lt;/ScalarOperator&gt; &lt;/Predicate&gt; &lt;/Filter&gt; &lt;/RelOp&gt; &lt;/Top&gt; &lt;/RelOp&gt; &lt;/QueryPlan&gt; &lt;/StmtSimple&gt; &lt;/Statements&gt; &lt;/Batch&gt; &lt;/BatchSequence&gt; &lt;/ShowPlanXML&gt; </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload