Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Alternative of Update; Insert if New

swapsingh2712
8 - Asteroid

Hello everyone,

I have a requirement to append new rows to a Snowflake table, but I need to ensure that duplicate rows are not inserted. Unfortunately, I cannot use an "Update; Insert if New" approach because the table does not have a primary key.

The table includes columns such as T_Date, Total_customers, and Total_Income, and I am generating rows for the past 3 days (T_Date).

Note: I cannot delete and recreate the table with a primary key due to the large amount of production data it contains.

Could you please suggest any alternative methods, such as pre or post SQL queries, to achieve this?

3 REPLIES 3
abacon
12 - Quasar

@swapsingh2712 Is removing duplicates in a Post SQL statement possible? You could write a script that checks if any duplicates are present and have it keep the most recent record.

 

Bacon

dataguyW
11 - Bolide

We have had the same issues and have adopted a method using JOINS.   We take the data we are about to load and create a NEW_KEY (date, id, whatever).   We then create the same key by reading the final table OLD_KEY.    You can do an in-db JOIN with full-outer join.   You can then see when the columns are NULL  in the OLD_KEY that the data is either New and needs to be inserted.   If there is a match on both new and old keys then you know the record is essentially updating what is there and if the old key is populated but the new key is empty you know those are your historical records that you want to simply retain.   Split up the INSERT/UPDATES and join the key back to your dataset.  Take the RETAIN keys and join back to your existing data set and simply UNION the two.   We use that method to set created dates, updated dates, etc. and use it to do a "MERGE/APPEND/KEEP" process as we build the final output table.     We write it to a temp table and then do one final step to overwrite the final table as a safety precaution in case there are values that may error on insertion to the table.   Depending on how much data you are talking about it is sometimes faster to reload a table vs. run deletes.

apathetichell
20 - Arcturus

So for tables which are not huge --- you should be able to use some logic like:

MERGE INTO target_table AS tgt
USING source_table AS src
ON tgt.id = src.id -- Define your join condition(s) to determine if a record exists
WHEN NOT MATCHED THEN
INSERT (id, column1, column2)
VALUES (src.id, src.column1, src.column2);

 

or so sql says... for larger tables --- this is a bigger problem. You should not be doing incremental loads from Alteryx into Snowflake like this. You should be doing some form of staging/raw table and cleaned table. You don't need a full on unity catalog like Databricks --- but you should have a raw table ---> which in this case you'd be writing to --> and some kind of CDC in Snowflake which writes to your clean table/view. If your system isn't set up this way --- that's the first issue you should be addressing.

Labels
Top Solution Authors