Note that there are some explanatory texts on larger screens.

plurals
  1. POInefficient SQL query with DATETIME calculations. How to optimize?
    primarykey
    data
    text
    <p>The problem comes from real environment where the <code>production_plan</code> table captures the order identification and other details in each row. Each row is updated when the product is started to be produced and after its production -- to capture UTC time of the events.</p> <p>There is a separate table <code>temperatures</code> that collects several temperatures at the production line -- in regular intervals, independently on anything, stored with the UTC.</p> <p>The goal is to extract the sequence of measured temperatures for production of each product. (Then the temeratures should be processed, the chart of the values is created and attached to the product item documentation for audit purposes.) </p> <p><strong>Updated</strong> after marc_s comment. The original question did not consider any indexes. The updated text considers the following. The original measurement mentioned in comments.</p> <p>The tables and indexes were created the following way:</p> <pre><code>CREATE TABLE production_plan ( order_id nvarchar(50) NOT NULL, production_line uniqueidentifier NULL, prod_start DATETIME NULL, prod_end DATETIME NULL ); -- About 31 000 rows inserted, ordered by order_id. ... -- Clusteded index on ind_order_id. CREATE CLUSTERED INDEX ind_order_id ON production_plan (order_id ASC); -- Non-clustered indices on the other columns. CREATE INDEX ind_times ON production_plan (production_line ASC, prod_start ASC, prod_end ASC); ------------------------------------------------------ -- There is actually more temperatures for one time (i.e. more -- sensors). The UTC is the real time of the row insertion, hence -- the primary key. CREATE TABLE temperatures ( UTC datetime PRIMARY KEY NOT NULL, production_line uniqueidentifier NULL, temperature_1 float NULL ); -- About 91 000 rows inserted ordered by UTC. ... -- Clusteded index on UTC is created automatically -- because of the PRIMARY KEY. Indices on temperature(s) -- do not make sense. -- Non-clustered index for production_line CREATE INDEX ind_pl ON temperatures (production_line ASC); -- The tables were created, records inserted, and the indices -- created for less than 1 second (for the sample on my computer). </code></pre> <p>The idea is to join the tables firstly on <code>production_line</code> identification, and secondly so, that the temperature UTC time fits between the UTC times of start/end of production of the item:</p> <pre><code>-- About 45 000 rows in about 24 seconds when no indices were used. -- The same took less than one second with the indices (for my data -- and my computer). SELECT pp.order_id, -- not related to the problem pp.prod_start, -- UTC of the start of production pp.prod_end, -- UTC of the end of production t.UTC, -- UTC of the temperature measurement t.temperature_1 -- the measured temperature INTO result_table02 FROM production_plan AS pp JOIN temperatures AS t ON pp.production_line = t.production_line AND t.UTC BETWEEN pp.prod_start AND pp.prod_end ORDER BY t.UTC; </code></pre> <p>The time about 24 seconds was not acceptable. It is clear that indexes were neccessary. The same operation took less than 1 second (the time in the yellow line below the result tabs in the Microsoft SQL Management Studio).</p> <p>However...</p> <p><strong>The second problem remains</strong></p> <p>Because the temperature measurement is not too frequent and because the places of measurement is a bit shifted in time from starting the production, the time correction must be done. In other words, two offsets must be added to the time-range boundaries. I have ended with the query like this:</p> <pre><code>-- About 46 000 rows in about 9 minutes without indices. -- It took about the same also with indices -- (8:50 instead of 9:00 or so). DECLARE @offset_start INT; SET @offset_start = -60 -- one minute = one sample before DECLARE @offset_end INT; SET @offset_end = +60 -- one minute = one sample after SELECT pp.order_id, -- not related to the problem pp.prod_start, -- UTC of the start of production pp.prod_end, -- UTC of the end of production t.UTC, -- UTC of the temperature measurement t.temperature_1 -- the measured temperature INTO result_table03 FROM production_plan AS pp JOIN temperatures AS t ON pp.production_line = t.production_line AND t.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start) AND DATEADD(second, @offset_end, pp.prod_end) ORDER BY t.UTC; </code></pre> <p>With the <code>DATEADD()</code> calculation, it takes about 9 minutes -- almost independently on whether indexes were created or not.</p> <p>Thinking more about how to solve the problem, it seems to me that the corrected time boundaries (the UTC's with added offsets) need their own indexes for the efficient processing. Creating a temporary table comes to my mind. Then the index can be created for its corrected columns. Using one more JOIN should help after that. Then the table can be dropped.</p> <p>Is the basic idea with the temporary table correct? Is there any other technique to do that?</p> <p>Thanks for your suggestions. I will update the time results after introduction of the indexes suggested by you. Please, explain the reasons for the expected improvement. I am beginner concerning hands-on experience when writing SQL solutions.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload