community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.
New Data Science Blog

Check out the latest post: All Models Are Wrong

READ MORE

Snowflake Bulk Loader

Alteryx
Alteryx

Here is a solution to bulk load data into snowflake using the snowflake internal stage.

Note: An AWS S3 bucket can also be used in place of the internal stage. 

 

Prerequisites

Create the table in Snowflake prior to running the workflow.

An easy method is to create a workflow with a Filter Tool.

Use a Customer Filter and set 1=1 in the expression.

Attach an Output Data Tool to the False output.

Configure the output tool with a Snowflake ODBC driver.

 

Download and install snowsql

 

The bulk loader workflow

The attached v11.7 workflow creates a temporary output file.

The Run Command Tool, creates sql script that is passed as a parameter to the snowsql command.

The script first creates an internal stage in snowflake.

It then copies the temporary output file to the internal storage.

The internal storage is then copied into the table.

Comments
Alteryx Partner

Wow, that's pretty exciting and does not require a S3 account, unlike the beta bulk Loader...

Thanks for sharing!

 

Cheers,

Frederic

Thanks for creating this!  It reduced my upload time to Snowflake from around 30 hours to about 6 minutes.  Just a few pointers for others who might be using this.

 

1) The configuration file for SnowSQL is found C:\Users\XXXXXX\.snowsql where XXXXXX is your user name for login into your computer.  I had to put the connection string in there manually.  Note that you can put multiple for different connections.connection.png

 

2) The %temp% call in Alteryx didn't work for me as it would truncate the pathway with a ~.  Instead I had to hard code it to a location on my desktop.   If you need to do this the following changes are needed

  • output command the temporary file
  • Put the full pathway in the text input for Temporary Output Filename
  • Inside the Macro in the Formula command remove the call to the temporary engine file in the put file command
  • Inside the Run snowsql command
    • hard code the output and the command arguments to the full pathway
    • update the connection to the one created in the config file

 

Could this script be adapted for bulk loading to Azure based Snowflake Accounts? It appear's Alteryx's new bulk loading tool for Snowflake only works with AWS-based snowflake accounts and not the new Azure based accounts.
Alteryx Partner

Technically, you don't need wither AWS or Azure to Bulk load into Snowflake, since the account comes with a Staging Area.

I developed my own Bulk method using SnowSQL. It is not as streamlined as having that logic built into the Alteryx connector, but it works well:

http://insightsthroughdata.com/how-to-load-data-in-bulk-to-snowflake-with-alteryx/

 

Someday, perhaps Alteryx will realize that using the Staging Area should be the only way to implement Bulk loading...

 

Alteryx Partner

I can't find the bulk_load.sql script - should it be in this package or available somewhere else ?

 

EDIT : Don't worry, I realised the script is created in the last step Smiley Happy   I just have to get it working 100% now Smiley Happy 

 

 

Alteryx Partner

 

@cmcclellan - The SQL is in the formula tool inside the macro

 

@DurgaS - thanks for this. I was planning building something similar so you have saved me a job! One of the features I was planning to build into it was also file splitting (gzip etc) as one of the key features Snowflake offers is the ability to parallel load compressed archives into Snowflake. It would be great if we could have an additional input of "Number of files" or "File Size" and then generate a gzip prior to loading. I'll take a look when I get some time and post it here if I get it working.

Alteryx Partner

Has anyone done this ?  - used the logic above to load the data into stage, then create the table with the proper datatypes for each field (taken from Alteryx?) and then load the data from stage.

Alteryx Partner
@cmcclellan Yes, it is painful, but the only way to go if you don't have a S3 instance available with a DBA attached...
Contributors