Note that there are some explanatory texts on larger screens.

plurals
  1. POQuery two related tables (Joins)
    primarykey
    data
    text
    <p>This is First table in Hive- It contains information about the item we are purchasing.</p> <pre><code>CREATE EXTERNAL TABLE IF NOT EXISTS Table1 (This is the MAIN table through which comparisons need to be made) ( ITEM_ID BIGINT, CREATED_TIME STRING, BUYER_ID BIGINT ) </code></pre> <p>And this is the data in the above first table</p> <pre><code>**ITEM_ID** **CREATED_TIME** **BUYER_ID** 220003038067 2012-06-21 1015826235 300003861266 2012-06-21 1015826235 140002997245 2012-06-14 1015826235 200002448035 2012-06-08 1015826235 260003553381 2012-06-07 1015826235 </code></pre> <p>This is Second table in Hive- It also contains information about the items we are purchasing.</p> <pre><code>CREATE EXTERNAL TABLE IF NOT EXISTS Table2 ( USER_ID BIGINT, PURCHASED_ITEM ARRAY&lt;STRUCT&lt;PRODUCT_ID: BIGINT,TIMESTAMPS:STRING&gt;&gt; ) </code></pre> <p>And this is the data in the above table-</p> <pre><code>**USER_ID** **PURCHASED_ITEM** 1015826235 [{"product_id":220003038067,"timestamps":"1340321132000"}, {"product_id":300003861266,"timestamps":"1340271857000"}, {"product_id":140002997245,"timestamps":"1339694926000"}, {"product_id":200002448035,"timestamps":"1339172659000"}, {"product_id":260003553381,"timestamps":"1339072514000"}] </code></pre> <p>I have reduced the data to only one BUYER_ID(USER_ID) to make the problem simple to understand.</p> <p><strong>Problem Statement-</strong></p> <p>I need to compare the <code>Table2</code> with <code>Table1</code>, which means I need to see if <code>USER_ID</code> from <code>Table2</code> and <code>BUYER_ID</code> from <code>Table1</code> (as they both are same thing) gets matched, then <code>PURCHASED_ITEM</code> in Table2 which is an Array of PRODUCT_ID(same as ITEM_ID) and TIMESTAMPS(same as CREATED_TIME) should be same as <code>ITEM_ID</code> and <code>CREATED_TIME</code> in <code>Table1</code> for that particular USER_ID(BUYER_ID) and also sometimes it is possible that they (means <code>PURCHASED_ITEM</code> and <code>ITEM_ID</code>, <code>CREATED_TIME</code>) are not same or some PRODUCT_ID and TIMESTAMPS is missing from <code>Table2</code> after comparing from <code>Table1</code>. </p> <p>By this I mean the count of <code>PRODUCT_ID</code> and <code>TIMESTAMPS</code> in <code>Table2</code> should be same as count of <code>ITEM_ID</code> and <code>CREATED_TIME</code> in Table1 for that particular BUYER_ID(USER_ID) and the content should be same. If they are not same or entry is missing from <code>Table2</code>, then I need to print the result, this particular <code>ITEM_ID</code> and <code>CREATED_TIME</code> is missing from <code>Table2</code> or the <code>PRODUCT_ID</code> and <code>TIMESTAMPS</code> are not same after comparing from <code>Table1</code>. </p> <p>So for example in Table1 currently for this <code>BUYER_ID 1015826235</code> I have <code>5 ITEM_ID</code> and <code>5 CREATED_TIME</code>, so in Table2 I should have <code>5 PRODUCT_ID</code> and <code>5 TIMESTAMPS</code> exactly same as Table1 for same <code>USER_ID(BUYER_ID)</code> in one row. If it is not same or entry is missing then I need to print the result showing this is missing or this data is wrong.</p> <p>So just to make it more clear-</p> <p><code>PURCHASED_ITEM</code> is an array of Struct in <code>Table2</code> and it contains two things <code>PRODUCT_ID</code> and <code>TIMESTAMPS</code>.</p> <p>If <code>USER_ID</code> and <code>BUYER_ID</code> gets matched then <code>PRODUCT_ID</code> in <code>Table2</code> should be matched with <code>ITEM_ID</code> in <code>Table1</code> and <code>TIMESTAMPS</code> in <code>Table2</code> should be matched with <code>CREATED_TIME</code> in <code>Table1</code>. </p> <p><strong>UPDATED</strong></p> <p><strong>HiveQL SQL Query Question:-</strong></p> <pre><code>Q 1) Find all USER_ID from Table2 whose PRODUCT_ID or TIMESTAMP are not same with ITEM_ID or CREATED_TIME after comparing with Table1 on BUYER_ID. </code></pre> <p><strong>Query that I wrote for first question. Is the query right?</strong></p> <pre><code>A 1) select Table2.user_id from Table2 where Table1.user_id = Table2.buyer_id and (Table1.item_id &lt;&gt; Table2.product_id or UNIX_TIMESTAMP(Table1.created_time) &lt;&gt; Table2.timestamps) Q 2) Find the `BUYER_ID(USER_ID)` and as well as those `ITEM_ID` and `CREATED_TIME` which are missing from `Table2` after comparing from `Table1` on `BUYER_ID`. A 2) Not sure. </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload