Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You probably need to use <a href="https://cwiki.apache.org/Hive/languagemanual-transform.html" rel="nofollow noreferrer">Hive transform functionality</a> and have a custom reducer that does the matching between the records from the two tables: t1 and t2 where t1 is simply TestingTable1 and t2 is </p> <pre><code> SELECT user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps FROM TestingTable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts </code></pre> <p><a href="https://stackoverflow.com/questions/11373543/explode-the-array-of-struct-in-hive#comment15007330_11373543">as explained by me in another question of yours</a>.</p> <pre><code>FROM ( FROM ( SELECT buyer_id, item_id, created_time, id FROM ( SELECT buyer_id, item_id, created_time, 't1' as id FROM TestingTable1 t1 UNION ALL SELECT user_id as buyer_id, prod_and_ts.product_id as item_id, prod_and_ts.timestamps as created_time, 't2' as id FROM TestingTable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts )t )x MAP buyer_id, item_id, created_time, id USING '/bin/cat' AS buyer_id, item_id, create_time, id CLUSTER BY buyer_id ) map_output REDUCE buyer_id, item_id, create_time, id USING 'my_custom_reducer' AS buyer_id, item_id, create_time, product_id, timestamps; </code></pre> <p>The above query has 2 distinct portions. The first part is "MAP" and the other is "REDUCE". In between these 2 parts is a phase called <em>shuffle</em> (represented by <code>CLUSTER BY buyer_id</code>) that is automatically taken care of my Hive. The Map part of the query reads from tables and also passes an identifier (called <em>id</em> that represents which tables the record is coming from). The Shuffle phase groups all the records per <em>buyer_id</em>. The Reduce phase will take in the all records for a given <em>buyer_id</em> and emit out only the records that satisfy the matching criteria. You will have to write the reducer yourself based on your matching criteria. You can write it in any language of your choice. It's guaranteed that all records that have the same buyer_id will go to the same reducer script.</p> <p>There might be an easier way to do but this is the method I can think of right now. Good luck! To gain further appreciation of why I chose this method, <a href="https://stackoverflow.com/questions/11387543/performance-tuning-a-hive-query/11405841#11405841">see my recent answer here</a>.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload