One of the common things that we need to do, is to take a delta-copy of a file or a DB table into the staging area of the analytical database.
This always looks very similar - so it would be useful to make this a wizard based process so that teams can easily build these very quickly rather than having to hand wrap:
- Check which primary keys exist - fill the gaps where they don't
- Are there any rows that update over time (or is this insert-only) - if they update over time, which column is the "updated date" column so that we can spot updates - if there is no update date; then we need to do a column by column check of some kind (like a hash or a checksum)
- Do you want to sync deletes?
- Do you want to keep updates?
- Target table in staging area which is now updated compared to the source
- Logging done (similar to what Kimball recommends in the ETL Handbook) with the run date/time; summary stats; and any errors
- Errors table for any errors that arose with row numbers
- Tables in target created (with history table if requested)