Note that there are some explanatory texts on larger screens.

plurals
  1. POJoining two Tables in Hive using HiveQL(Hadoop)
    primarykey
    data
    text
    <blockquote> <p><strong>Possible Duplicate:</strong><br> <a href="https://stackoverflow.com/questions/11386368/sql-query-join-with-table">SQL Query JOIN with Table</a> </p> </blockquote> <pre><code>CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable1 (This is the MAIN table through which comparisons need to be made) ( BUYER_ID BIGINT, ITEM_ID BIGINT, CREATED_TIME STRING ) </code></pre> <p>And this is the data in the above first table</p> <pre><code>**BUYER_ID** | **ITEM_ID** | **CREATED_TIME** --------------+------------------+------------------------- 1015826235 220003038067 *2001-11-03 19:40:21* 1015826235 300003861266 2001-11-08 18:19:59 1015826235 140002997245 2003-08-22 09:23:17 1015826235 *210002448035* 2001-11-11 22:21:11 </code></pre> <p>This is Second table in Hive- It also contains information about the items we are purchasing.</p> <pre><code>CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable2 ( USER_ID BIGINT, PURCHASED_ITEM ARRAY&lt;STRUCT&lt;PRODUCT_ID: BIGINT,TIMESTAMPS:STRING&gt;&gt; ) </code></pre> <p>And this is the data in the above second table (<code>TestingTable2</code>)-</p> <pre><code>**USER_ID** **PURCHASED_ITEM** 1015826235 [{"product_id":220003038067,"timestamps":"1004941621"}, {"product_id":300003861266,"timestamps":"1005268799"}, {"product_id":140002997245,"timestamps":"1061569397"},{"product_id":200002448035,"timestamps":"1005542471"}] </code></pre> <p><strong>Compare <code>TestingTable2</code> with <code>TestingTable1</code> so that below scenario is fulfilled.</strong> </p> <p>Find the <code>PRODUCT_ID</code> AND <code>TIMESTAMPS</code> from <code>TestingTable2</code> WHICH IS NOT MATCHING WITH <code>ITEM_ID</code> AND <code>CREATED_TIME</code> from TestingTable1 CORRESPONDING TO <code>BUYER_ID(USER_ID)</code> after comparing from <code>TestingTable1</code>.</p> <p>So If you look <code>TestingTable2</code> data this(last) <code>ITEM_ID 210002448035</code> from <code>TestingTable1</code> is not matching with <code>TestingTable2</code> <code>PRODUCT_ID- 200002448035</code> data and similarly with timestamps. So I want to show the below result using the HiveQL query. </p> <pre><code>**BUYER_ID** | **ITEM_ID** | **CREATED_TIME** | **PRODUCT_ID** | **TIMESTAMPS** --------------+------------------+--------------------------------+------------------------+---------------------- 1015826235 *210002448035* 2001-11-11 22:21:11 200002448035 1005542471 1015826235 220003038067 *2001-11-03 19:40:21* 220003038067 1004941621 </code></pre> <p>Can anyone help me with this. As I am new to HiveQL so having lot of Problem.</p> <p><strong>Updated:-</strong></p> <p>I have written this query, but it doesn't working the way I wanted to.</p> <pre><code>select * from (select * from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts LEFT OUTER JOIN testingtable1 ON ( prod_and_ts.user_id = testingtable1.buyer_id AND testingtable1.item_id = prod_and_ts.product_id AND prod_and_ts.timestamps = UNIX_TIMESTAMP (testingtable1.created_time) ) where testingtable1.buyer_id IS NULL) set_a LEFT OUTER JOIN testingtable1 ON (set_a.user_id = testingtable1.buyer_id AND ( set_a.product_id = testingtable1.item_id OR set_a.timestamps = UNIX_TIMESTAMP(testingtable1.created_time) ) ); </code></pre> <p><strong>One More UPDATE</strong></p> <p>As per <code>user1166147</code> comments. I wrote my query as per his query. In hive I guess <code>INNER JOIN</code> are written by simply <code>JOIN</code>.</p> <p>This is my below query.</p> <pre><code>select * from (select t2.buyer_id, t2.item_id, t2.created_time as created_time, subq.user_id, subq.product_id, subq.timestamps as timestamps from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 lateral view explode(purchased_item) exploded_table as prod_and_ts) subq JOIN testingtable1 t2 on t2.buyer_id = subq.user_id AND subq.timestamps = unix_timestamp(t2.created_time) WHERE (subq.product_id &lt;&gt; t2.item_id) union all select t2.buyer_id, t2.item_id as item_id, t2.created_time, subq.user_id, subq.product_id as product_id, subq.timestamps from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 lateral view explode(purchased_item) exploded_table as prod_and_ts) subq JOIN testingtable1 t2 on t2.buyer_id = subq.user_id and subq.product_id = t2.item_id WHERE (subq.timestamps &lt;&gt; unix_timestamp(t2.created_time))) unionall; </code></pre> <p>And after running the above query, I am getting zero result back.</p> <p><strong>One Final UPDATE:-</strong></p> <p>My Bad, I didn't have the accurate data in the tables so that is the reason I was not getting the result back. Yes it is working the actual above query.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload