Note that there are some explanatory texts on larger screens.

plurals
  1. POVertica INTERPOLATE feature
    primarykey
    data
    text
    <h1>Vertica's shiny feature:</h1> <p>I have two tables that I would like to join with <strong>INTERPOLATE</strong> and expect data from the second table to be interpolated with the latest available. But unfortunately I am not able to get my desired result. I have checked out the Vertica documentation related to the <a href="https://my.vertica.com/docs/6.1.x/HTML/index.htm#15903.htm" rel="nofollow">INTERPOLATE</a> feature and tried an example which worked fine.</p> <pre><code>CREATE TABLE a ( family int, date DATE, id int ); CREATE TABLE b ( Id int, date DATE, datapoint float ); INSERT INTO a VALUES (1, '20130603', 1); INSERT INTO a VALUES (1, '20130604', 1); INSERT INTO a VALUES (1, '20130605', 1); INSERT INTO a VALUES (1, '20130606', 1); INSERT INTO a VALUES (1, '20130607', 1); INSERT INTO b VALUES (1, '20130603', 3.00); SELECT a.family, a.date, a.id, a.date, b.datapoint FROM a LEFT JOIN b ON a.id = b.id AND a.date INTERPOLATE PREVIOUS VALUE b.date; vertdeva01:20130612-095628 &gt; \g family | date | id | date | data --------+------------+----+------------+------ 1 | 2013-06-03 | 1 | 2013-06-03 | 3 1 | 2013-06-04 | 1 | 2013-06-04 | 3 1 | 2013-06-05 | 1 | 2013-06-05 | 3 1 | 2013-06-06 | 1 | 2013-06-06 | 3 1 | 2013-06-07 | 1 | 2013-06-07 | 3 </code></pre> <p>There I get the results as expected. The values in table <strong>b</strong> are interpolated according to the dates in table <strong>a</strong>.</p> <p>But when I try something similar to a slightly more complex scenario I don't really get what I want. </p> <h2>What I intend to achieve:</h2> <p><em>What I intend to achieve is select latest available data from <strong>b</strong> for every id in <strong>a</strong> that matches the corresponding date in <strong>a</strong>. So if <strong>a</strong> has an (id,date) combination then I would like to fetch data from <strong>b</strong> for that id and date. But if there is no data for that id in b on that date, then fetch what is available <strong>AS OF</strong> that date. Fetch data that would be valid as of the date in <strong>a</strong>. In other words, a feel-back behavior. A data point for an id is valid in <strong>b</strong> as long as there is no other data point for that id after that date. I hope that makes sense. I know a way to do this using <code>MAX()</code> and <code>GROUP BY</code>. I would like to know if the same is possible using <code>INTERPOLATE</code></em> </p> <h2>An Example:</h2> <p>Just to give you an idea of what it is, I played with an example. This time I just modified the previously created tables to have more fields. </p> <pre><code>CREATE TABLE a ( family int, family_name varchar(50), industry varchar(15), style_flag varchar(1), id int, id_name varchar(50), id1 int, id2 int, id3 int, date DATE, id4 int ); CREATE TABLE b ( id4 int, flag int, period int, date DATE, datapoint float ); INSERT INTO a VALUES (1, '1family', 'comp', 'A', 1, '1 id', 101, 201, 301, '20130603', 401); INSERT INTO a VALUES (1, '1family', 'comp', 'A', 2, '2 id', 102, 202, 302, '20130603', 402); INSERT INTO a VALUES (1, '1family', 'comp', 'A', 3, '3 id', 103, 203, 303, '20130603', 403); INSERT INTO a VALUES (2, '2family', 'bio', 'A', 5, '5 id', 105, 205, 305, '20130603', 405); INSERT INTO a VALUES (2, '2family', 'bio', 'A', 7, '7 id', 107, 207, 307, '20130603', 407); INSERT INTO a VALUES (2, '2family', 'bio', 'A', 9, '9 id', 109, 209, 309, '20130603', 409); INSERT INTO b VALUES (401, 1, 10, '20130501', 2.00); INSERT INTO b VALUES (401, 1, 20, '20130501', 1.50); INSERT INTO b VALUES (401, 2, 10, '20130409', 12.34); INSERT INTO b VALUES (401, 2, 20, '20130401', 10.56); INSERT INTO b VALUES (402, 1, 10, '20130501', 2.00); INSERT INTO b VALUES (402, 2, 20, '20130409', 12.34); INSERT INTO b VALUES (402, 2, 20, '20130401', 10.56); INSERT INTO b VALUES (402, 2, 20, '20130515', 20.55); </code></pre> <p>when I run the following query:</p> <pre><code>SELECT a.family, a.family_name, a.industry, a.style_flag, a.id, a.id_name, a.id1, a.id2, a.id3, a.date, b.id4, b.flag, b.period, b.datapoint FROM a LEFT JOIN b ON a.id4 = b.id4 AND a.date INTERPOLATE PREVIOUS VALUE b.date; </code></pre> <p>I get the following:</p> <pre><code>family | family_name | industry | style_flag | id | id_name | id1 | id2 | id3 | date | id4 | flag | period | datapoint --------+-------------+----------+------------+----+---------+-----+-----+-----+------------+-----+------+--------+----------- 2 | 2family | bio | A | 5 | 5 id | 105 | 205 | 305 | 2013-06-03 | | | | 1 | 1family | comp | A | 1 | 1 id | 101 | 201 | 301 | 2013-06-03 | 401 | 1 | 10 | 2 1 | 1family | comp | A | 3 | 3 id | 103 | 203 | 303 | 2013-06-03 | | | | 2 | 2family | bio | A | 9 | 9 id | 109 | 209 | 309 | 2013-06-03 | | | | 2 | 2family | bio | A | 7 | 7 id | 107 | 207 | 307 | 2013-06-03 | | | | 1 | 1family | comp | A | 2 | 2 id | 102 | 202 | 302 | 2013-06-03 | 402 | 2 | 20 | 20.55 </code></pre> <p>But I need to select the latest value available for an id from <strong>b</strong> for a kind of group by of (id4,flag, period), instead of what it is currently giving me as a result. is there a way that I can make use of the <strong>INTERPOLATE</strong> feature for this? Or should I take a completely different approach. The problem is data in table <strong>b</strong> is sparse. We may not have a datapoint every day, where as in <strong>a</strong> we have a data point every day. </p> <p>I also tried filling in the gaps between the data points in <strong>b</strong> using <strong>TIMESERIES</strong> clause and <strong>TS_FIRST_VALUE(datapoint, 'const')</strong>. But there again, the latest date available for a combination of id4, flag, period in <strong>b</strong> could be way back in time when compared to the date for an id in <strong>a</strong>. And I end up with the same problem as demonstrated above. </p> <p>Any guidance would be highly appreciated. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload