community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Alteryx SnowSQL macro to quickly load data into Snowflake

Highlighted

Hello,

I recently did a presentation on using both Alteryx and Snowflake together to scale up our processes. This post is a follow-up for those interested in learning more technical details about it.

 

DISCLAIMER

While this process did significantly cut our data load times, I will warn that it does take some effort, trial and error to get it working. Also, I can't guarantee that it will work with your IT setup and/or security policies. Please review the process and discuss it with your IT team first, prior to embarking on this path.

 

About

For those that do not know, Snowflake is an analytic data warehouse built for the cloudAlteryx is flexible enough to integrate with Snowflake, which allows you to leverage Alteryx's flexibility with Snowflake's speed and concurrency... A powerful combo!!

 

ODBC Connection

The main way to connect Alteryx and Snowflake is the ODBC connection. This connection allows one to easily load data into Snowflake. See this article for details on setting up your Alteryx-Snowflake ODBC connection

 

In-DB tools

Once this ODBC connection is in place, it allows one to use In-DB tools. See this article for details.

 

Snowflake Bulk Loader

Beyond this initial ODBC connection, if one happens to have access to an AWS bucket, then one can try using the Snowflake Bulk Loader. I have not personally tried this approach, but this article has more.

 

Alteryx SnowSQL Macro

While all of this can be very useful, sometimes writing data to Snowflake this way can still be a little slow. Instead, our team created an Alteryx SnowSQL macro that mimics Snowflake's bulk load process by using SnowSQL in a Run command tool in an Alteryx macro to load data into Snowflake. This Alteryx macro also can be transformed to run other SnowSQL commands as well, potentially fulfilling other needs. Below are the steps.

 

Step 1 - Snowflake Bulk Load Process:

First, one has to make sure one can load data into Snowflake using SnowSQL. Here are the general steps one would take just to load a file into Snowflake using there Bulk Load Process:

  1. In Snowflake, create a Snowflake-managed Stage (see guide)
  2. In Snowflake, setup a FileFormat for your inbound file (like CSV)
  3. In Snowflake, create a Table that mimics the column names/types of the inbound file
  4. Download & install CLI Client (SnowSQL) from Snowflake following these steps
  5. For Windows users, open Command Prompt window, enter 'snowsql' to download newest version. See previous step documentation for other Operating Systems
  6. Enter your Snowflake Username and Password to login to the Command Prompt window
  7. At this point, you could use your Command Prompt window to load data... The basic SQL commands are:
    1. USE Role, Database, Warehouse, and Schema;
    2. PUT FILE @Stage;
    3. COPY INTO Table FROM @Stage FileFormat;

Here is another article that outlines this part of the process more thoroughly. Once you get this process working once or twice, you are ready for the next step.

 

Step 2 - Automate the SnowSQL Startup:

If you want to automate the process and load values on startup, then:

  1. Locate the ".snowsql" folder that was installed in your directory folders
  2. Open "config" file and update the default startup values. Remember to delete the "#", or the value will remain commented out. Insert your values for the groups below:
    1. accountname, username, password
    2. (optional) dbname, schemaname, warehousename,rolename

 

Next time, when you open SnowSQL in your Command Prompt, it should automatically log you in to the account with the database and everything you specified. Once you get this process working once or twice, you are ready for the next step.

 

Step 3 - Setup Alteryx SnowSQL Macro:

Next is to build the Alteryx SnowSQL Macro. Below is an image of the macro:

 

SnowSQL Macro - overview.png

 

Here are the key components:

  • Text Input - Doesn't matter what the value is, it will be selected out in the select tool
  • Formula - this is where your SQL statement lives (see SQL example below) 
  • Control Parameters - there is a control parameter for every component of the SQL statement that you may want to change in the workflow
  • Select - deselect the original text input field you created
  • Run Command - this is configured to run snowsql.exe as the external program and pass through the previous inputs as command arguments 
    • SETUP - You will have to create a "bulkload.sql" file to be used as a temporary output. Just open up a new blank Text file, and save it as "bulkload.sql" in a folder that no one will delete. The contents will be overwritten each time the macro runs.
      • NOTE - if you want to load multiple files to Snowflake at the same time, you may have to create multiple macros that each use a different "bulkload.sql" file.
    • Command - the location of the 'snowsql.exe', likely in your program files
    • Command Arguments - Enter the following, where the [folder location] is the folder location of where you saved the 'bulkload.sql' file: 
      • -c example -f "[folder location]\bulkload.sql"
    • Write Source - this outputs the results to a file
    • Run External Program

SnowSQL Macro - SQL.png

PLEASE NOTE: This is one example of SQL code. Please make sure your SQL does what you want it to do! 

 

In this example, the SnowSQL is telling Snowflake...

  • Use a specified role, warehouse, database, and schema
  • Then put a file from a specified file location into the Snowflake Stage
  • Then recreate the specified table so that it is empty, but keeps the same metadata of the column names/types
  • Then copy into the specified table, from the specified file in the Snowflake stage, using a specified file format, and purge the data afterwards

Once this is built, give it a try. You should see a Command Prompt open to SnowSQL, but the commands will fail, since they don't have values yet. If so, you are ready for the next step.

 

Step 4 - Setup Alteryx Workflow:

Now that you have a working macro, let's put it in a workflow that can utilize it. 

 

SnowSQL Macro - workflow w explanations v1.PNG

 

In this example, we input the file we want to load to Snowflake.

 

First, we have to clean the data. There are 3 main issues I typically see:

  • Date & DateTime need to be in Snowflake format:  2019-12-31
  • String length needs to be at or below the Snowflake maximum:  16,777,216
  • UTF-8 issues, which means there are foreign characters in the data. There are multiple ways to deal with this issue. Here are 2 ways:
    • Replace each character:
      • ReplaceChar([Field Name],"ß","b")
    • Remove all foreign characters using RegEx and replace with nothing (I found this solution from another community post):
      • TitleCase(DecomposeUnicodeForMatch(regex_replace([Field Name],'[^\x00-\x7f]', '')))

Second, we create the Snowflake table. We only have to run this once to setup the table in Snowflake. If we limit the incoming records to just 1, then it runs faster. With the Data Stream In tool, you have the option to “Create New Table” and can provide a name. When it runs the first time, your table will be created with all of the field names and field types you specified in the select column earlier. After your table is created, you can disable this container. I keep it around, in case I ever need to recreate the table again (for example, if more columns are added).

 

Third, we create a CSV out of our original file that has now been cleaned up. We found it easiest to name the file the same name as the table you are creating in Snowflake. Keep track of this CSV file name and location, as we will use it in our Formula tool later.

 

Fourth, we limit to just 1 record, to run it faster, then input our Snowflake Details in the Formula tool. We need fields for all of the SnowSQL macro Control Parameters. This includes:

  • File Location: Use the location of the CSV file you just created.
    • NOTE: If you have spaces in your folder structure, you may have to put an * in place of the space. This wild card allows for the space to be captured. We used the following formula to replace the spaces with * in our [folder location]:
      • ReplaceChar(“[Folder Location]\"," ","*")
    • File Name: Use the file name of the CSV file you just created. Again, you may have to replace spaces with an *.
    • The rest point to variables for your Snowflake instance and should just be replica string values:
      • Role, Warehouse, Database, Schema, Stage, Table, File Format

We then only select the fields we just created and add our SnowSQL Macro. The inputs to the macro should be the fields that correspond to each control parameter in the macro. If you name them roughly the same, then that should help match them up.

 

Now, your workflow should be ready to load your file into Snowflake. You can re-purpose this workflow for loading other files in, by just changing the input, output, create table, and the formula tools. The rest can be re-used over again as needed.

 

Step 5 - Setup Alteryx Server:

Last is getting the workflow to run in Alteryx Server. As my disclaimer from before, it may or may not be possible, depending on your IT setup and security. Also, it's possible I missed a step on this section, as we tried a few different approaches with our IT team before it finally worked... It's probably best to work with your IT team to get it going.

 

  1. [if needed] Redo step 1.4 of installing CLI Client (SnowSQL). This time, make sure it is on a folder that Alteryx Server can access. For us, it was our Shared Drive that Alteryx Server could previously access.
  2. [if needed] Redo step 2 of updating the 'config' file. This time, make sure the '.snowsql' is on the folder that Alteryx Server can access.
  3. [if needed] Redo step 3 of creating the macro. This time, make sure your 'bulkload.sql' file is also on a folder that Alteryx Server can access. Change the "Run Command" tools 'Write Source' Output is pointed to this location, along with the 'Command Arguemnts' [file location].
  4. This one I'm a little fuzzy on the details. I believe, since we had SSO, we had to have our dummy user's credentials added to both Alteryx Prod Server & Dev Server. 

Note:  If you are using SSO with Snowflake, then you may need IT to provide one user or a dummy user to not be in SSO for this process. There may need to be at least one account in Snowflake that has a defined username and password. Otherwise, the connection may be unable to validate.

 

At the end of all of this, you can hopefully re-use the workflow & macro over and over again to more quickly load all of your data. For us, what previously took hours per load is down to minutes, even on a fairly small warehouse setup in Snowflake. 

 

Anyways, hope this helps!! Please reach out with questions or suggestions. Thanks!

Best,
Ben

Meteoroid

Hi Ben,

 

Thanks a lot for sharing your solution to quickly load data into Snowflake. I have replicated your workflow in our environment but just wanted to share a few learnings / notes to add on to the instructions below. It was my first time using the Run Command and Macro so I was struggling to determine how to map the instructions to what I was seeing on screen.

 

Step 3 - Setup Alteryx SnowSQL Macro:

 

Run Command config

1. Write Source [Optional] - Output  this should point to the bulkload.sql file where

File Format = Set to Comma-Separated Value (*.csv)

First Row Contains field names = UNCHECK

Quote Output Fields = NEVER

Write BOM = UNCHECK

This output will contain the SQL command to be passed during execution.

 

2.Run External Program - Command should point to where your snowsql.exe is.

 

3 Command Arguments, I only added -f which is pointing to the bulkload.sql file.

 

RunCommand.png

 

 

Run Command Output.pngRun Command Output

 

Step 4 - Setup Alteryx Workflow:

1. FileLocation - I needed to use double backslash for the directory C:\Users\User1\Documents should be  C:\\Users\\User1\\Documents\\

 

2. On my first try, I missed to limit the output to 1 and snowsql was invoked based on the number of records the csv file returned, so don't forget to add the Sample tool and limit to First Row (1 row).  This is how the final workflow looked like.

 

 

Workflow.png

 

 

where the SnowSQL Macro had the following:

Macro.png

 

 

 

Meteoroid

Ben,


Thanks a lot for the post, it is very informative. Do you know if there is a way to save the log from SnowSQL to a text file, and use that text file as an input for the Run Command?  I would like to use the log as a trigger for the second part of my workflow.

Labels