Hi All,
Need your help, i just completed by basic Alteryx training and now left alone to load 1Billion records from Snowflake into SQL Server. I used Input data and selected Snowflake as source and in the target I have selected Microsoft SQL Server Bulk Load. The job is currently taking 2 hours for loading just 104Million and still running.
Is there any better way to do it? I was surfing through this community and found Connect In-db option? do you think i can make use of it for my scenario ?
what steps should i follow to load the data using Connect In-DB?
Yes I think it would be better using indb tools
1. you have to create a odbc to call the snowflake database
2. then you have to drag the Connect indb and select Manage Connections, select snowflake and other data
3. Select the odbc you already created in the read and write tabs
4. you have created the connection with snowflake
The second part is drag a data streamout and connect to the bulk
I guess it could improve a little the performance. Also you can try using the write indb to load into the sql server
Other things you can also check/try:
- Use the AMP option in the workflow properties
- use calgary to load the billion and then use the write indb to load into sql server
Just some ideas
I hope it helps!
Hey @rajaganapathi ,
Using In-DB tools would not normally speed up this process - in-fact it may well slow it down because InDB tools make use of temporary table structures like Common Table expressions or Temp tables.
The SQL Server bulk loader is the fastest way to load data into SQL with Alteryx.
Some notes:
- It may be that Snowflake is the slow part of this journey - there may be a way to bulk-extract out of Snowflake instead of using ODBC. ODBC on Snowflake can be slow for many common operations in bulk (which is why there is a bulk loader for moving data into snowflake)
- Make sure that Alteryx has selected reasonable data types - if it's pushing everything into VWString fields of length 1073741823 - then this will be terribly slow no matter what you do. Be very economical about your data types, and make them as small as possible without causing an overrun.
- if the slow part of this is loading the data INTO MSSQL - then you may have overly complex table structures - for fastest load into a SQL database, you'd want to suspend all constraints and indexes.
- look at the commit size on your output tool - if you do this in commit batches of 50 000 then you probably have a good mix of commit and bulk write (in-case the process stops half way at least you have some committed data)