Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><strong>EDIT - PART 1</strong> Okay - For some reason I am going to explain myself - so to start with I stumbled upon this question because of the SQL tag, and saw Hive, and started to not look and just skip it. BUT then I noticed it had been over a day and you had gotten no answers. I looked - I saw a SQL logic correction in the original query posted that I knew would be needed and would help, so I posted ONLY because no one had answered. I will try to address this last question - but after that I am keeping my advice to myself, as I may be giving bad advice. Good luck! I tried! And you seem to be getting answers now, so...</p> <p><strong>In TSQL, I could solve this entire problem with the below single query</strong>:</p> <pre><code>SELECT * FROM SO_Table1HIVE A FULL OUTER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID) </code></pre> <p>It would return everything, including your match buyer_id/user_id only. It won't match a buyer_id/user_id row with no matches in either time or product in the other table, but it will return it as a separate row with NULLS in the other table's fields. I would not match these any way - there is no accurate information provided to do it with as explained below.</p> <p><strong>END EDIT PART 1</strong></p> <p>If you can't do <code>FULL OUTER JOIN</code> with <code>OR</code> in Hive, the simplest way to meet the original criteria is to <code>UNION ALL</code> 2 <code>INNER JOIN</code>s. On one of the queries, in addition to joining the matching user_ids, join on the PRODUCT_ID AND in your <code>WHERE</code> look for TIMESTAMPS that don't match CREATED_TIME. On the second query, in addition to joining the matching user_ids, join on the times AND in your <code>WHERE</code> look for products that don't match. </p> <p><strong>EDIT PART 2 - UPDATE FOR COMMENT QUESTION ADDITIONAL CRITERIA</strong></p> <p>If I understand the last criteria it is any record in either table that has a matching user_id = buyer_id, but nothing else matches. The <code>FULL OUTER JOIN</code> with <code>OR</code> condition will return them, but there isn't enough provided info for a way to relate the records to each other. We can easily identify them, but have no way to tie them back to each other. If you do so and you have more than one record without a match in either OR both tables, there are going to be multiple entries for each. </p> <p><em>Any query I wrote to try to tie them without more info (and probably with) would be a guess and inaccurate.</em> </p> <p>For example, in the first table if there were these 2 (sample fake) records with nothing matching in the second except user_id:</p> <pre><code>1015826235 420003038067 2011-11-03 19:40:21.000 1015826235 720003038067 2004-11-03 19:40:21.000 </code></pre> <p>AND in table2 - these non matching:</p> <pre><code>1015826235 {"product_id":520003038067,"timestamps":"10... 1015826235 {"product_id":620003038067,"timestamps":"10... </code></pre> <p>You can identify them, but if you match them without more criteria you get 4 instead of 2:</p> <pre><code>1015826235 420003038067 2011-11-03 19:40:21.000 1015826235 520003038067 1015826235 420003038067 2011-11-03 19:40:21.000 1015826235 620003038067 1015826235 720003038067 2004-11-03 19:40:21.000 1015826235 520003038067 1015826235 720003038067 2004-11-03 19:40:21.000 1015826235 620003038067 </code></pre> <p>My suggestion would be simply to identify them and show them, as below.</p> <pre><code>BUYER_ID ITEM_ID CREATED_TIME USER_ID PRODUCTID timestamps ---------------------------------------------------------------------- NULL NULL NULL 1015826235 520003038067 2009-11-11 22:21:11.000 NULL NULL NULL 1015826235 620003038067 2008-11-11 22:21:11.000 1015826235 420003038067 2011-11-03 19:40:21.000 NULL NULL NULL 1015826235 720003038067 2004-11-03 19:40:21.000 NULL NULL NULL </code></pre> <p><strong>END EDIT PART 2 - UPDATE FOR COMMENT QUESTION ADDITIONAL CRITERIA - PART 1</strong></p> <p>I am working with TSQL, so I can't test for you an exact query with your syntax, but the concepts of the joins are the same, and this will return what you want. I did take your query and attempt your syntax, modify as needed. I tested in TSQL. You may be able to take this and improve upon it with functionality in HiveQL. There are other ways to do this - but this is the most straightforward and this will translate to HiveQL.</p> <p><strong>REMOVED, YOU GOT THIS PART AND IT IS INCLUDED LATER</strong></p> <p>(<em>Again modify syntax as needed</em>)**</p> <pre><code>SELECT * FROM ( SELECT BUYER_ID,ITEM_ID,CREATED_TIME,PRODUCT_ID,TIMESTAMPS FROM testingtable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts INNER JOIN table2 A ON A.BUYER_ID = prod_and_ts.[USER_ID] AND prod_and_ts.timestamps = UNIX_TIMESTAMP (table2.created_time) WHERE prod_and_ts.product_id &lt;&gt; A.ITEM_ID UNION ALL SELECT BUYER_ID,ITEM_ID,CREATED_TIME,PRODUCT_ID,TIMESTAMPS FROM testingtable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts INNER JOIN table2 A ON A.BUYER_ID = prod_and_ts.[USER_ID] AND prod_and_ts.product_id = A.ITEM_ID WHERE prod_and_ts.timestamps &lt;&gt; UNIX_TIMESTAMP (table2.created_time) ) X </code></pre> <p>And here is my <strong>tested TSQL version with my table names for reference:</strong></p> <pre><code>SELECT * FROM( SELECT * FROM SO_Table1HIVE A INNER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND B.t1time = A.Created_TIME WHERE B.PRODUCTID &lt;&gt; A.ITEM_ID UNION ALL SELECT * FROM SO_Table1HIVE A INNER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND B.PRODUCTID = A.ITEM_ID WHERE B.t1time &lt;&gt; A.Created_TIME ) X </code></pre> <p>*<strong>EDIT PART 3 - UPDATE FOR COMMENT QUESTION ADDITIONAL CRITERIA -PART 2</strong></p> <p>In TSQL the entire query (no unions) can be run using a <code>FULL OUTER JOIN</code> with an <code>OR</code> condition on the join</p> <pre><code>SELECT * FROM SO_Table1HIVE A FULL OUTER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID) </code></pre> <p>If you can't simply do the above, For the SQL logic for the new criteria - to grab those that don't match from both tables and display them as <code>NULL</code> in the other table use <code>RIGHT JOIN</code> and <code>LEFT JOIN</code>. <code>RIGHT JOIN</code> will grab anything in the first table the matches the second and everything in the second, and <code>LEFT</code> does the opposite. Add the new queries to your <code>UNION</code>. </p> <p>TSQL EXAMPLE - MODIFY FOR HIVE</p> <pre><code>SELECT * FROM SO_Table1HIVE A RIGHT JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID) WHERE A.BUYER_ID IS NULL UNION ALL SELECT * FROM SO_Table1HIVE A LEFT JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID) WHERE B.[USER_ID] IS NULL </code></pre> <p>Or, If you wanted to grab them and match them as duplicates add to <code>UNION</code>:</p> <p>TSQL</p> <pre><code>SELECT * FROM SO_Table1HIVE A JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] WHERE B.t1time NOT IN(SELECT Created_TIME FROM SO_Table1HIVE) AND A.Created_TIME NOT IN(SELECT t1time FROM SO_Table2HIVE) AND B.PRODUCTID NOT IN(SELECT ITEM_ID FROM SO_Table1HIVE) AND A.ITEM_ID NOT IN(SELECT PRODUCTID FROM SO_Table2HIVE) </code></pre> <p><em><strong>Again, Good luck!</em></strong> </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload