Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>We use a closed process for this type of work.</p> <p>Get data from Remote system into Staging tables, Import into Product tables.</p> <p>If anything can insert into Staging Tables at-will, and there are Child Tables, then there is risk that you might Import into the Production DB before all the Child Records are created in the staging tables.</p> <p>If you are free to add columns to the Staging Tables (which the remote end will ignore), or if the staging tables have an IDENTITY or GUID that is unique / non-repeating, then you can create a parallel table.</p> <p>Ideally the routine creating rows in the Staging Table will use a Batch number, and then create a "Batch Number Done" record when successful. So you have a semaphore to stop you importing until all associated records are in.</p> <p>(They could be inserted in a Transaction Block, but you would have to be confident that all processes inserting in Staging Table honoured that).</p> <p>Given the IDENTITY / GUID I would create a 1:1 "Error table" to store any messages describing import failure.</p> <p>You may choose to move, or copy, failed rows to a separate failures staging table, so that the main staging table doesn't get clogged up and it is easier to make sure that the failures are being sorted out (by a human I presume).</p> <p>Having said that here is a more detailed description of our specific processes:</p> <p>To minimise bandwidth and updates (i.e. reduce blocking and minimise unnecessary transaction log entries etc.) we do the following:</p> <p>On Source machine hold a copy of the table being transferred. This has additional columns for Updated-on and Action (Update or Delete flag - Update includes Insert, and Insert may have been Updated again before destination ever gets that row ...)</p> <p>Inserts to this table, of <strong>new</strong>, rows are OK</p> <p>Updates to this table only happen if there is a difference in at least one column - so its OK (if rather CPU intensive) to compare the whole source table against the staging table to find out what has changed. Updates set the Updated-on column. (Beware of when the clocks go back)</p> <p>Periodically we flag the stagin table rows Action=Deleted if they cannot be found in the Source table.</p> <p>The data is copied from Source to identical tables on Destination where the Updated-on is after the last transfer.</p> <p>On the Destination server the routine that checks the data, and imports it to Production, works solely on the Updated-on date (process everything since last Updated-on)</p> <p>All staging tables are in a separate database which has minimal logging - the database will automatically refresh from "last updated date" if it is restored, so our aim is to minimise transaction log. We don;t store the staging tables in the Production database (which will have full transaction logging) in order to avoid bloating the TLogs</p> <p>Again, it is helpful if these processes are sequential so that they cannot happen concurrently, otherwise some sort of Batch No is needed to prevent transferring partially completed batches where Parent/Child tables exist.</p> <p>We transfer the data in the style of the Source database, and only make data manipulations at the Destination end. The reason for this is that if the data manipulation is wrong, and we fix it, we only have to rerun it at the Destination end - the Source data is all present in the staging tables at the Destination end. If we did the manipulation at the Source end we would have to retransmit all the data again, as well.</p> <p>For debugging having the Updated-on date in the Staging table is helpful. We often times get "Why is this different" and seeing the Updated-on tells us if the manipulation on the Destination end is at fault (i.e. we have recent data showing the expected Source data) or the Source end (no recent data found!)</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload