This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
While this process did significantly cut our data load times, I will warn that it does take some effort, trial and error to get it working. Also, I can't guarantee that it will work with your IT setup and/or security policies. Please review the process and discuss it with your IT team first, prior to embarking on this path.
For those that do not know, Snowflake is an analytic data warehouse built for the cloud. Alteryx is flexible enough to integrate with Snowflake, which allows you to leverage Alteryx's flexibility with Snowflake's speed and concurrency... A powerful combo!!
Beyond this initial ODBC connection, if one happens to have access to an AWS bucket, then one can try using the Snowflake Bulk Loader. I have not personally tried this approach, but this article has more.
Alteryx SnowSQL Macro
While all of this can be very useful, sometimes writing data to Snowflake this way can still be a little slow. Instead, our team created an Alteryx SnowSQL macro that mimics Snowflake's bulk load process by using SnowSQL in a Run command tool in an Alteryx macro to load data into Snowflake. This Alteryx macro also can be transformed to run other SnowSQL commands as well, potentially fulfilling other needs. Below are the steps.
Step 1 - Snowflake Bulk Load Process:
First, one has to make sure one can load data into Snowflake using SnowSQL. Here are the general steps one would take just to load a file into Snowflake using there Bulk Load Process:
In Snowflake, create a Snowflake-managed Stage (see guide)
In Snowflake, setup a FileFormat for your inbound file (like CSV)
In Snowflake, create a Table that mimics the column names/types of the inbound file
Next time, when you open SnowSQL in your Command Prompt, it should automatically log you in to the account with the database and everything you specified. Once you get this process working once or twice, you are ready for the next step.
Step 3 - Setup Alteryx SnowSQL Macro:
Next is to build the Alteryx SnowSQL Macro. Below is an image of the macro:
Here are the key components:
Text Input - Doesn't matter what the value is, it will be selected out in the select tool
Formula - this is where your SQL statement lives (see SQL example below)
Control Parameters - there is a control parameter for every component of the SQL statement that you may want to change in the workflow
Select - deselect the original text input field you created
Run Command - this is configured to run snowsql.exe as the external program and pass through the previous inputs as command arguments
SETUP - You will have to create a "bulkload.sql" file to be used as a temporary output. Just open up a new blank Text file, and save it as "bulkload.sql" in a folder that no one will delete. The contents will be overwritten each time the macro runs.
NOTE - if you want to load multiple files to Snowflake at the same time, you may have to create multiple macros that each use a different "bulkload.sql" file.
Command - the location of the 'snowsql.exe', likely in your program files
Command Arguments - Enter the following, where the [folder location] is the folder location of where you saved the 'bulkload.sql' file:
-c example -f "[folder location]\bulkload.sql"
Write Source - this outputs the results to a file
Run External Program
PLEASE NOTE:This is one example of SQL code. Please make sure your SQL does what you want it to do!
In this example, the SnowSQL is telling Snowflake...
Use a specified role, warehouse, database, and schema
Then put a file from a specified file location into the Snowflake Stage
Then recreate the specified table so that it is empty, but keeps the same metadata of the column names/types
Then copy into the specified table, from the specified file in the Snowflake stage, using a specified file format, and purge the data afterwards
Once this is built, give it a try. You should see a Command Prompt open to SnowSQL, but the commands will fail, since they don't have values yet. If so, you are ready for the next step.
Step 4 - Setup Alteryx Workflow:
Now that you have a working macro, let's put it in a workflow that can utilize it.
In this example, we input the file we want to load to Snowflake.
First, we have to clean the data. There are 3 main issues I typically see:
Date & DateTime need to be in Snowflake format: 2019-12-31
String length needs to be at or below the Snowflake maximum: 16,777,216
UTF-8 issues, which means there are foreign characters in the data. There are multiple ways to deal with this issue. Here are 2 ways:
Replace each character:
Remove all foreign characters using RegEx and replace with nothing (I found this solution from another community post):
Second, we create the Snowflake table. We only have to run this once to setup the table in Snowflake. If we limit the incoming records to just 1, then it runs faster. With the Data Stream In tool, you have the option to “Create New Table” and can provide a name. When it runs the first time, your table will be created with all of the field names and field types you specified in the select column earlier. After your table is created, you can disable this container. I keep it around, in case I ever need to recreate the table again (for example, if more columns are added).
Third, we create a CSV out of our original file that has now been cleaned up. We found it easiest to name the file the same name as the table you are creating in Snowflake. Keep track of this CSV file name and location, as we will use it in our Formula tool later.
Fourth, we limit to just 1 record, to run it faster, then input our Snowflake Details in the Formula tool. We need fields for all of the SnowSQL macro Control Parameters. This includes:
File Location: Use the location of the CSV file you just created.
NOTE: If you have spaces in your folder structure, you may have to put an * in place of the space. This wild card allows for the space to be captured. We used the following formula to replace the spaces with * in our [folder location]:
ReplaceChar(“[Folder Location]\"," ","*")
File Name: Use the file name of the CSV file you just created. Again, you may have to replace spaces with an *.
The rest point to variables for your Snowflake instance and should just be replica string values:
Role, Warehouse, Database, Schema, Stage, Table, File Format
We then only select the fields we just created and add our SnowSQL Macro. The inputs to the macro should be the fields that correspond to each control parameter in the macro. If you name them roughly the same, then that should help match them up.
Now, your workflow should be ready to load your file into Snowflake. You can re-purpose this workflow for loading other files in, by just changing the input, output, create table, and the formula tools. The rest can be re-used over again as needed.
Step 5 - Setup Alteryx Server:
Last is getting the workflow to run in Alteryx Server. As my disclaimer from before, it may or may not be possible, depending on your IT setup and security. Also, it's possible I missed a step on this section, as we tried a few different approaches with our IT team before it finally worked... It's probably best to work with your IT team to get it going.
[if needed] Redo step 1.4 of installing CLI Client (SnowSQL). This time, make sure it is on a folder that Alteryx Server can access. For us, it was our Shared Drive that Alteryx Server could previously access.
[if needed] Redo step 2 of updating the 'config' file. This time, make sure the '.snowsql' is on the folder that Alteryx Server can access.
[if needed] Redo step 3 of creating the macro. This time, make sure your 'bulkload.sql' file is also on a folder that Alteryx Server can access. Change the "Run Command" tools 'Write Source' Output is pointed to this location, along with the 'Command Arguemnts' [file location].
This one I'm a little fuzzy on the details. I believe, since we had SSO, we had to have our dummy user's credentials added to both Alteryx Prod Server & Dev Server.
Note: If you are using SSO with Snowflake, then you may need IT to provide one user or a dummy user to not be in SSO for this process. There may need to be at least one account in Snowflake that has a defined username and password. Otherwise, the connection may be unable to validate.
At the end of all of this, you can hopefully re-use the workflow & macro over and over again to more quickly load all of your data. For us, what previously took hours per load is down to minutes, even on a fairly small warehouse setup in Snowflake.
Anyways, hope this helps!! Please reach out with questions or suggestions. Thanks!
Thanks a lot for sharing your solution to quickly load data into Snowflake. I have replicated your workflow in our environment but just wanted to share a few learnings / notes to add on to the instructions below. It was my first time using the Run Command and Macro so I was struggling to determine how to map the instructions to what I was seeing on screen.
Step 3 - Setup Alteryx SnowSQL Macro:
Run Command config
1. Write Source [Optional] - Output this should point to the bulkload.sql file where
File Format = Set to Comma-Separated Value (*.csv)
First Row Contains field names = UNCHECK
Quote Output Fields = NEVER
Write BOM = UNCHECK
This output will contain the SQL command to be passed during execution.
2.Run External Program - Command should point to where your snowsql.exe is.
3 Command Arguments, I only added -f which is pointing to the bulkload.sql file.
Run Command Output
Step 4 - Setup Alteryx Workflow:
1. FileLocation - I needed to use double backslash for the directory C:\Users\User1\Documents should beC:\\Users\\User1\\Documents\\
2. On my first try, I missed to limit the output to 1 and snowsql was invoked based on the number of records the csv file returned, so don't forget to add the Sample tool and limit to First Row (1 row). This is how the final workflow looked like.
Thanks a lot for the post, it is very informative. Do you know if there is a way to save the log from SnowSQL to a text file, and use that text file as an input for the Run Command? I would like to use the log as a trigger for the second part of my workflow.