Note that there are some explanatory texts on larger screens.

plurals
  1. POyet another date gap-fill SQL puzzle
    primarykey
    data
    text
    <p>I'm using Vertica, which precludes me from using CROSS APPLY, unfortunately. And apparently there's no such thing as CTEs in Vertica.</p> <p>Here's what I've got:</p> <pre><code>t: day | id | metric | d_metric -----------+----+--------+---------- 2011-12-01 | 1 | 10 | 10 2011-12-03 | 1 | 12 | 2 2011-12-04 | 1 | 15 | 3 </code></pre> <p>Note that on the first day, the delta is equal to the metric value. I'd like to fill in the gaps, like this:</p> <pre><code>t_fill: day | id | metric | d_metric -----------+----+--------+---------- 2011-12-01 | 1 | 10 | 10 2011-12-02 | 1 | 10 | 0 -- a delta of 0 2011-12-03 | 1 | 12 | 2 2011-12-04 | 1 | 15 | 3 </code></pre> <p>I've thought of a way to do this day by day, but what I'd really like is a solution that works in one go.</p> <p>I think I could get something working with LAST_VALUE, but I can't come up with the right JOIN statements that will let me properly partition and order on each id's day-by-day history.</p> <p>edit: assume I have a table like this:</p> <pre><code>calendar: day ------------ 2011-01-01 2011-01-02 ... </code></pre> <p>that can be involved with joins. My intent would be to maintain the date range in <strong>calendar</strong> to match the date range in <strong>t</strong>.</p> <p>edit: A few more notes on what I'm looking for, just to be specific:</p> <p>In generating <strong>t_fill</strong>, I'd like to exactly cover the date range in <strong>t</strong>, as well as any dates that are missing in between. So a correct <strong>t_fill</strong> will start on the same date and end on the same date as <strong>t</strong>. <strong>t_fill</strong> has two properties:</p> <p>1) once an id appears on some date, it will always have a row for each later date. This is the gap-filling implied in the original question.</p> <p>2) Should no row for an id ever appear again after some date, the <strong>t_fill</strong> solution should merrily generate rows with the same metric value (and 0 delta) from the date of that last data point up to the end date of <strong>t</strong>.</p> <p>A solution might backfill earlier dates up to the start of the date range in <strong>t</strong>. That is, for any id that appears after the first date in <strong>t</strong>, rows between the first date in <strong>t</strong> and the first date for the id will be filled with metric=0 and d_metric=0. I don't prefer this kind of solution, since it has a higher growth factor for each id that enters the system. But I could easily deal with it by selecting into a new table only rows where metric!=0 and d_metric!=0.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload