StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<h2>Step 1: Release the handbrakes</h2> <blockquote> <p>... if a bit slow</p> </blockquote> <pre><code>SELECT to_char(MIN(ts)::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ') AS min_time ,SUM(CASE WHEN sensor_id = 572 THEN value ELSE 0.0 END) AS nickname1 ,SUM(CASE WHEN sensor_id = 542 THEN value ELSE 0.0 END) AS nickname2 ,SUM(CASE WHEN sensor_id = 571 THEN value ELSE 0.0 END) AS nickname3 FROM sensor_values -- LEFT JOIN sensor_values_cleaned s2 USING (sensor_id, ts) WHERE ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp AND ts < '2013-10-18T00:00:00+00:00'::timestamptz::timestamp AND sensor_id IN (572, 542, 571, 540, 541, 573) GROUP BY ts::date AS day ORDER BY 1; </code></pre> <h3>Major points</h3> <ul> <li><p>Replace <a href="http://www.postgresql.org/docs/current/interactive/sql-keywords-appendix.html" rel="nofollow noreferrer"><strong>reserved words</strong></a> (in standard SQL) in your identifiers.<br> <code>timestamp</code> -> <code>ts</code><br> <code>time</code> -> <code>min_time</code></p></li> <li><p>Since the join is on identical column names you can use the simpler <a href="http://www.postgresql.org/docs/current/interactive/queries-table-expressions.html#QUERIES-FROM" rel="nofollow noreferrer"><strong><code>USING</code> clause</strong></a> in the join condition: <code>USING (sensor_id, ts)</code><br> However, since the second table <code>sensor_values_cleaned</code> is 100% irrelevant to this query, I removed it entirely.</p></li> <li><p>As @joop already advised, switch <code>min()</code> and <code>to_char()</code> in your first out put column. This way, Postgres can determine the minimum from the <strong>original column value</strong>, which is generally faster and may be able to utilize an index. In this specific case, <strong>ordering by <code>date</code></strong> is also cheaper than ordering by a <code>text</code>, which would also have to consider collation rules.</p></li> <li><p>A similar consideration applies to your <code>WHERE</code> condition:<br> <strike>WHERE ts::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptz</strike></p> <pre><code>WHERE ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp </code></pre> <p>The second one is <a href="http://en.wikipedia.org/wiki/Sargable" rel="nofollow noreferrer"><strong>sargable</strong></a> and can utilize a plain index on <code>ts</code> - to great effect on performance in big tables!</p></li> <li><p>Using <code>ts::date</code> instead of <code>date_trunc('day', ts)</code>. Simpler, faster, same result.</p></li> <li><p>Most probably your second WHERE condition is slightly incorrect. Generally, you would <strong>exclude the upper border</strong>:<br> <strike><pre><code>AND ts <b><=</b> '2013-10-18T00:00:00+00:00' ...</code></pre></strike></p> <pre><code>AND ts <b><</b> '2013-10-18T00:00:00+00:00' ...</code></pre></li> <li><p>When mixing <code>timestamp</code> and <code>timestamptz</code> one needs to be aware of the effects. For instance, your <code>WHERE</code> condition doesn't cut at 00:00 local time (except if local time coincides with UTC). Details here:<br> <a href="https://stackoverflow.com/questions/9571392/ignoring-timezones-altogether-in-rails-and-postgresql/9576170#9576170">Ignoring timezones altogether in Rails and PostgreSQL</a></p></li> </ul> <h2>Step 2: Your request</h2> <blockquote> <p>...the difference between the latest and earliest timestamps in each grouping</p> </blockquote> <p>And by that I suppose you mean:<br> ...the difference between <strong><em>the value of</em></strong> the latest and earliest timestamps ...<br> Otherwise it would be much simpler.</p> <p>Use <a href="http://www.postgresql.org/docs/current/interactive/functions-window.html" rel="nofollow noreferrer"><strong>window functions</strong></a> for that, in particular <code>first_value()</code> and <code>last_value()</code>. Careful with the combination, you want a <a href="http://www.postgresql.org/docs/current/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS" rel="nofollow noreferrer">non-standard window frame</a> for last_value() in this case. Compare:<br> <a href="https://stackoverflow.com/questions/8320569/postgresql-aggregate-or-window-function-to-return-just-the-last-value/8320588#8320588">PostgreSQL aggregate or window function to return just the last value</a></p> <p>I combine this with <a href="https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group/7630564#7630564"><strong><code>DISTINCT ON</code></strong></a>, which is more convenient in this case than <code>GROUP BY</code> (which would need another subquery level):</p> <pre><code>SELECT DISTINCT ON (ts::date, sensor_id) ts::date AS day ,to_char((min(ts) OVER (PARTITION BY ts::date))::timestamptz ,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time ,sensor_id ,last_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) - first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts) AS val_range FROM sensor_values WHERE ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp AND ts < '2013-10-18T00:00:00+0'::timestamptz::timestamp AND sensor_id IN (540, 541, 542, 571, 572, 573) ORDER BY ts::date, sensor_id; </code></pre> <p><a href="http://sqlfiddle.com/#!15/07461/1" rel="nofollow noreferrer"><strong>-> SQLfiddle demo.</strong></a></p> <h2>Step 3: Pivot table</h2> <p>Building on the query above I use <a href="https://stackoverflow.com/questions/3002499/postgresql-crosstab-query/11751905#11751905"><strong><code>crosstab()</code></strong></a> from the additional module <code>tablefunc</code>:</p> <pre><code>SELECT * FROM crosstab( $$SELECT DISTINCT ON (1,3) ts::date AS day ,to_char((min(ts) OVER (PARTITION BY ts::date))::timestamptz,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time ,sensor_id ,last_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) - first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts) AS val_range FROM sensor_values WHERE ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp AND ts < '2013-10-18T00:00:00+0'::timestamptz::timestamp AND sensor_id IN (540, 541, 542, 571, 572, 573) ORDER BY 1, 3$$ ,$$VALUES (540), (541), (542), (571), (572), (573)$$ ) AS ct (day date, min_time text, s540 numeric, s541 numeric, s542 numeric, s571 numeric, s572 numeric, s573 numeric); </code></pre> <p>Returns (and <em>much</em> faster than before):</p> <pre><code> day | min_time | s540 | s541 | s542 | s571 | s572 | s573 ------------+--------------------------+-------+-------+-------+-------+-------+------- 2013-10-14 | 2013-10-14 03:00:00 CEST | 18.82 | 18.98 | 19.97 | 19.47 | 17.56 | 21.27 2013-10-15 | 2013-10-15 00:15:00 CEST | 22.59 | 24.20 | 22.90 | 21.27 | 22.75 | 22.23 2013-10-16 | 2013-10-16 00:16:00 CEST | 23.74 | 22.52 | 22.23 | 23.22 | 23.03 | 22.98 2013-10-17 | 2013-10-17 00:17:00 CEST | 21.68 | 24.54 | 21.15 | 23.58 | 23.04 | 21.94 </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload