Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to incrementally update a table
    primarykey
    data
    text
    <p>We're using Hive and have a data flow that looks like:</p> <pre><code> SOURCE -&gt; Flume -&gt; S3 Buckets -&gt; Script -&gt; Hive Table </code></pre> <p>We have a table that looks something like, truncated for brevity:</p> <pre><code> CREATE TABLE core_table ( unique_id string update bigint other_data string ) </code></pre> <p>Now we also have the update table - same structure <code>core_update</code> this table may contain duplicated data (e.g. duplicated unique_id, but increasing bigint, also it's ordered later int the file).</p> <p>Is there a good way to apply the updated that are in <code>core_update</code> to <code>core_table</code> while both adding new unique_id's to the table and updating the base data.</p> <p>-- Note: I am trying to avoid something that looks like: MERGE -> DEDUP since that process takes about 3 hours on the smaller datasets, and we've got one dataset that is whomping huge. So doing something that's akin to an insertion sort would be great.</p> <p>Update: Found the following blog post by IBM <a href="http://ibm.co/15bMSxk" rel="nofollow">http://ibm.co/15bMSxk</a> And it says:</p> <p>Algorithm-2: Update into un-partitioned table</p> <ul> <li><p>Step-1 Run the merge join query Input : mainTable, staging table name to hold merged records(call as stagingTable3), un-partitioned staging table name(call as stagingTable2), table primary key, table fields Build the merge join query:</p> <p>insert overwrite table stagingTable3 select each column in "List tableFields" Add field name with the alias A from mainTable with alias A Apply the left outer join with stagingTable2 with alias B Check for where A.primaryKey = B.primaryKey and where B.primaryKey is null</p> <p>Then union with the data selected from stagingTable2</p></li> <li><p>Step-2: Load the data by overwriting from stagingTable3 to mainTable by using the below given load query:</p> <pre><code>load data inpath stagingTable3 overwrite into mainTable </code></pre></li> </ul> <p>However this still doesn't quite make sense (or work in my interpretation).</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload