Hello,
I have several questions about Snowflake connecting from Alteryx.
1. Bulk loader without primary key field
Would it be possible to upload data into snowflake using bulk loader that does not have primary key field?
Background: There is a primary key field with AUTOINCREMENT in snowflake
2. Insert new data into snowflake using bulk load
It seems like "Update: Insert if new" is not available for bulk load. In that case, probably "Append existing" is the best way. If there is any suggestions, please let me know!
3. Modify database by fill in value
What is the best way to modify data in snowflake?
Background:
There are some rows that are missing values. I would like to modify/fill-in data in snowflake. Would it be better deleting specific rows and append modified data? OR modify data using sql statement in Alteryx output tool? OR clone original table --> create modified table --> replace table?
4. Normal input and output tool vs In-Database connection tools
What is the advantage/disadvantage using In-Database connection tools?
Sincerely,
knozawa
Solved! Go to Solution.
Hopefully these answers will help 🙂
1) Yes, I'm pretty sure you can do this. In the file format you can specify that the PK field is not in the file and Snowflake should create the value. I can dig out some details from an old project if you need more info 🙂
2) I don't think Snowflake support UPSERT when loading. Usually I've uploaded to an empty table and used SQL to achieve the same thing
3) Depends on your situation. Snowflake isn't designed for single record maintenance, you can do it, but don't try it for a large number of records and expect good performance. It can take seconds to bulk load millions of rows, but hours to edit a thousand records individually. If you understand how Snowflake works then you can design a really fast workflow
4) The same as every other database. InDB means that the data stays in the database. It's VERY fast and efficient and if you're just transforming data (not loading or unloading) then InDB will make things really fast.
Thank you very much for your detailed reply!
About the #1 question (auto increment PK using bulk loader), when I tried to upload data into snowflake without a primary key field using bulk loading, I've received following error message:
The COPY failed with error: Number of columns in file (2) does not match that of the corresponding table (3), use file format option error_on_column_count_mismatch=false to ignore this error
I've asked the same question on the stack overflow as well:https://stackoverflow.com/questions/60326474/autoincrement-primary-key-for-snowflake-bulk-loading/60...
It seems like there is a way to work around the issue like you said. Do you know if we can achieve it using an output tool modifying the pre/post SQL or do we need to use snowsql and cmd tool?
Sincerely,
knozawa
Yes, I agree with the SO reply, leave the column out of the COPY statement and it should work. The autoincrement field is implied because it has to be there 🙂
In @cmcclellan 's point 3: "It can take seconds to bulk load millions of rows, but hours to edit a thousand records individually. If you understand how Snowflake works then you can design a really fast workflow"
This is a very interesting insight! Snowflake is optimised for bulk operations and analytics, not good for operation.
Hi knozawa,
Have you found out how to configure the output tool in order to overcome the column count mismatch? I'm struggling with the same problem. And I don't see how to get Alteryx to pass on the 'copy into' command either. Any insight would be appreciated.
Thanks,
8088.
@8088 ,
I don't remember if this solved the issue, but I put a multi-field formula to remove some unicode character, data cleanse tool to remove trailing white space, tabs, line breaks, and sum tool to remove extra redundant rows before the snowflake bulk loading output.
I still face into following error with the output tool sometimes though: The COPY failed with error: Error from AWS: We encountered an internal error. Please try again.Error from AWS: One or more of the specified parts could not be found. The part may not have been uploaded, or the specified entity tag may not match the part's entity tag.
I get above error randomly and when I run the workflow again, it sometimes successfully run but not always.
knozawa
Hi knozawa,
From what I see in your screenshot, you are describing how you shape the data before the output tool. So should I assume you are using a regular output (or in-db output tool) that is configured with a connection using the Snowflake bulk loader driver ? In other words, you are NOT using your workflow to form a put statement that you then pass on to the api via the api output tool.
Yes. I shaped the data before the regular output tool with snowflake bulk loading.
Sincerely,
knozawa
Thanks.
BTW - did you use a regular output tool or an in-db output tool ? And I assume you configured the connection to write using the SF bulk loader ?
Not sure if this is of interest to you at this point, but over the weekend, I found that if I write to the table using the ODBC connection rather than the bulk loader connection, the autoincrement field will populate and increment (and no error due to field count mismatch). But if I use the bulk loader connection, the record gets added (again, no field count mismatch error) but the autoincrement field remains null.
A couple of inferences: Since the field was an autoincrement, SF is not expecting it to be provided during the insert and will let the autoincrement sequence do its thin, which is what happened using ODBC. But when I use the bulkloader, the value is null, implying that between the staging and 'copy into' that go on in the background, the autoincrement field was identified as being part of the table and when no value was provided during the upload to staging, it defaulted to null.
So, as such, if you're writing a small number of rows to a table with autoincrement, use ODBC and write directly to the table (the perfomance penalty will be small if the row count is low). But if you're writing a large number of rows to a table with an autoincrement and you want to use the bulk loader, then you had better get Alteryx to determine what the values should be, generate them and include them in the output - SF will allow you to populate explicit values into an autoincrement field. I'm hoping for a better solution - perhaps someone on the Alteryx side with insight as to what happens between the Alteryx output tool and insert into the table when using different staging options could shed some more informed light onto this. I expect Snowflake's popularity to grow quite quickly in and this will require getting craftier in controlling the bulk loading process.
User | Count |
---|---|
19 | |
15 | |
15 | |
9 | |
8 |