StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POInefficient SQL query with DATETIME calculations. How to optimize?
primarykey
Id
11692368
data
AcceptedAnswerId
11729350
AnswerCount
5
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2012-07-27T17:03:00.273
FavoriteCount
0
LastActivityDate
2012-07-31T16:16:24.990
LastEditDate
2012-07-27T20:35:01.140
LastEditorUserId
1346705
OwnerUserId
1346705
ParentId
0
PostTypeId
1
Score
1
ViewCount
1977
LastEditorDisplayName
text
Body
The problem comes from real environment where the <code>production_plan</code> table captures the order identification and other details in each row. Each row is updated when the product is started to be produced and after its production -- to capture UTC time of the events. There is a separate table <code>temperatures</code> that collects several temperatures at the production line -- in regular intervals, independently on anything, stored with the UTC. The goal is to extract the sequence of measured temperatures for production of each product. (Then the temeratures should be processed, the chart of the values is created and attached to the product item documentation for audit purposes.) Updated after marc_s comment. The original question did not consider any indexes. The updated text considers the following. The original measurement mentioned in comments. The tables and indexes were created the following way: <pre><code>CREATE TABLE production_plan ( order_id nvarchar(50) NOT NULL, production_line uniqueidentifier NULL, prod_start DATETIME NULL, prod_end DATETIME NULL ); -- About 31 000 rows inserted, ordered by order_id. ... -- Clusteded index on ind_order_id. CREATE CLUSTERED INDEX ind_order_id ON production_plan (order_id ASC); -- Non-clustered indices on the other columns. CREATE INDEX ind_times ON production_plan (production_line ASC, prod_start ASC, prod_end ASC); ------------------------------------------------------ -- There is actually more temperatures for one time (i.e. more -- sensors). The UTC is the real time of the row insertion, hence -- the primary key. CREATE TABLE temperatures ( UTC datetime PRIMARY KEY NOT NULL, production_line uniqueidentifier NULL, temperature_1 float NULL ); -- About 91 000 rows inserted ordered by UTC. ... -- Clusteded index on UTC is created automatically -- because of the PRIMARY KEY. Indices on temperature(s) -- do not make sense. -- Non-clustered index for production_line CREATE INDEX ind_pl ON temperatures (production_line ASC); -- The tables were created, records inserted, and the indices -- created for less than 1 second (for the sample on my computer). </code></pre> The idea is to join the tables firstly on <code>production_line</code> identification, and secondly so, that the temperature UTC time fits between the UTC times of start/end of production of the item: <pre><code>-- About 45 000 rows in about 24 seconds when no indices were used. -- The same took less than one second with the indices (for my data -- and my computer). SELECT pp.order_id, -- not related to the problem pp.prod_start, -- UTC of the start of production pp.prod_end, -- UTC of the end of production t.UTC, -- UTC of the temperature measurement t.temperature_1 -- the measured temperature INTO result_table02 FROM production_plan AS pp JOIN temperatures AS t ON pp.production_line = t.production_line AND t.UTC BETWEEN pp.prod_start AND pp.prod_end ORDER BY t.UTC; </code></pre> The time about 24 seconds was not acceptable. It is clear that indexes were neccessary. The same operation took less than 1 second (the time in the yellow line below the result tabs in the Microsoft SQL Management Studio). However... The second problem remains Because the temperature measurement is not too frequent and because the places of measurement is a bit shifted in time from starting the production, the time correction must be done. In other words, two offsets must be added to the time-range boundaries. I have ended with the query like this: <pre><code>-- About 46 000 rows in about 9 minutes without indices. -- It took about the same also with indices -- (8:50 instead of 9:00 or so). DECLARE @offset_start INT; SET @offset_start = -60 -- one minute = one sample before DECLARE @offset_end INT; SET @offset_end = +60 -- one minute = one sample after SELECT pp.order_id, -- not related to the problem pp.prod_start, -- UTC of the start of production pp.prod_end, -- UTC of the end of production t.UTC, -- UTC of the temperature measurement t.temperature_1 -- the measured temperature INTO result_table03 FROM production_plan AS pp JOIN temperatures AS t ON pp.production_line = t.production_line AND t.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start) AND DATEADD(second, @offset_end, pp.prod_end) ORDER BY t.UTC; </code></pre> With the <code>DATEADD()</code> calculation, it takes about 9 minutes -- almost independently on whether indexes were created or not. Thinking more about how to solve the problem, it seems to me that the corrected time boundaries (the UTC's with added offsets) need their own indexes for the efficient processing. Creating a temporary table comes to my mind. Then the index can be created for its corrected columns. Using one more JOIN should help after that. Then the table can be dropped. Is the basic idea with the temporary table correct? Is there any other technique to do that? Thanks for your suggestions. I will update the time results after introduction of the indexes suggested by you. Please, explain the reasons for the expected improvement. I am beginner concerning hands-on experience when writing SQL solutions.
Tags
<sql><sql-server>
Title
Inefficient SQL query with DATETIME calculations. How to optimize?
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USpepr
UserOwnerUserId
1. USpepr
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POInefficient SQL query with DATETIME calculations. How to optimize?
 UserUserId
 USpepr
 VoteTypeVoteTypeId
 VTBountyStart
2. VO
 singulars
 PostPostId
 POInefficient SQL query with DATETIME calculations. How to optimize?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.