Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Write/Append Changes to Snowflake table without rerunning entire query

Aj4151987
6 - Meteoroid

Hi all—I’ve been trying to search and wrack my brain on this one and hoping to get some help here. I created a custom Snowflake table from a query that pulls from a number of Snowflake databases. The amount of data was very large so I had to break it up into about 6 or 7 chunks of writing and appending data to it. Everything is done using In Database tools. What I’d like to do now is automate the workflow daily to only append new data or changes that have been made based on the criteria of the query. There shouldn’t be many changes each day so instead of rerunning the entire query every day I’m looking for the most efficient way possible to append changes and/or new rows of data. My thought is that now that I have the table created I use that as my input and somehow union that with the query? Or maybe use dynamic input in some sort of way. The issue is that there isn’t just one unique key. I’ll attach sample data so it hopefully is easier to visualize. Basically the records already in place can change (like new contract dates, changing the vendor they bill with etc.) so it can’t necessarily be static while just adding new records. Maybe some sort of custom key is needed? I’m currently validating if a certain ID is a true unique key which would probably really help. So any suggestions based on the assumption that there IS a unique key (although I know some providers don’t have an ID. Any suggestions are greatly appreciated!

7 REPLIES 7
apathetichell
18 - Pollux

I always recommend keeping your query in a text field - editing it via formula - and using dynamic input in-db to get your incremental db which you'd then union. I'm confused why you have your issues with primary keys though - Snowflake does not enforce primary keys. If you have the potential for duplicated data in your final table - you need to handle that organically (ie create a check in your incremental query, use a summarize tool to dedupe etc.).

 

Aj4151987
6 - Meteoroid

Ok so are you saying use the text input tool and feed that to the filter tool to determine which new records are to come in? Didn’t know I could do that. Maybe I’m misunderstanding. Then feed that query into the dynamic in database tool or first feed the query then use the in database filter tool? After which I would union in database with the main database as my other input. Then I’d have to stream out to get unique records… how do I ensure only new records are appended? And not every row from the input again as well? And I’m not having primary key issues that was more of me typing out loud wondering when changes to a provider’s record changes if it should be a “new” record or a “change.” Hopefully I’m following… thanks for the reply.

apathetichell
18 - Pollux

dynamic input in-db allows you send queries into in-db via fields (full query in one field/connection in the other) - this can allow you to dynamically change you queries. I'm not sure here though - are you adding records incrementally in snowflake - or extracting records? If your end to end is:

 

1) create a new table based upon data available.

2) union this table with existing table.

3) capture distinct records.

4) extract new records

 

you may need to use a dummy field prior to a union to capture changes (for joined record you'd use an outer join and then look for entries with a null value in a field via a filter tool to capture only the changed records)- and then a summarize tool for unique records. Sorry - I'm not fully cleary on all of your process.

Aj4151987
6 - Meteoroid

Ah that makes sense. And that’s a great question I probably wasn’t clear on. Now that I have my base table built and the query that was used to create it, all I want to do is append uniquely new records from the query output to the existing table. So let’s say I table is 5 million rows and tomorrow 5 new unique records would be generated based on the same query used to create the table, all I want to do is append those 5 new records in the most efficient way possible. The only bottle neck in my original query was writing all the rows to the snowflake table. I should also mention the tables used to create the query are all in Snowflake as well and I’m just writing it to my team’s own schema. So I think you’re saying the dummy field when null would act as an indicator that those are new unique records?

apathetichell
18 - Pollux

sorry still not quite there - but maybe getting there:

 

let's say:

 

new query (in my vision brought in via dynamic input-in db) is left.

 

right will be master table.

 

two formula-in-dbs can create rowids (I don't have the snowflake syntax at my fingertips but it's not terrible).

 

let datastream has a formula-in-db for flag.

 

left outer join on your various fields.

 

filter in-db for your dummy field. These are the records only in your left side.

 

 

a) send those to datastream out for analysis?

 

b) drop the dummy field - 

 

main datstream goes to write-in-db to update your table that you'll use for right datastream on the next run

 

maybe? thats how I'm kind of seeing this.

 

 

Aj4151987
6 - Meteoroid

Apologies for the slow response. Yes this is theoretically what I absolutely am envisioning in my mind. I spent yesterday attempting to materialize what you laid out but am having some pain points. I've decided that to find new records it's best to left outer join on two specific fields and then build a T/F filter to test that condition. For records that are false I created another filter with a Case statement (there is probably an easier way to do this) comparing most of the other fields for changes. Where there are no changes those records are not appended; where there are changes those will be considered record updates. For now the only deletion I'd want is for any contracts where the current_date doesn't fall between them. This will include existing records. I'm having trouble even getting my SQL fed from a text input tool to the Dynamic Input In-DB. Do you have any suggestions for that? I've tried googling and searching the forums to figure it out but can't seem to find an answer. I'm attaching what I have thus far seeing if maybe I can get some assistance on that first step. Thanks.

apathetichell
18 - Pollux

Interesting! I've never used Dynamic Input In-DB with a file connection - but if this works like the standard connection your file://.... should go in a text field and your query should go in a query field - one row/one field. then they should be mapped via dynamic input in-db.  I'd put the larger query in the text input field (if this maps to multiple rows/use a summarize tool in concat mode with a delimieter of /n - take your max connection to not lose that data). This would allow you change you query outside of the in-db.

 

if you get an issue mapping a file connection the connection dropdown in dynamic input in-db - you can use a macro to sub it out. but this would be a bug. if it works normally - it shoudl work there.

Labels