Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

General Discussions has some can't miss conversations going on right now! From conversations about automation to sharing your favorite Alteryx memes, there's something for everyone. Make it part of your community routine!

Loading to SQL Server Database (3.4 Billion Records, 240 GB)

dshaw
8 - Asteroid

I need some suggestions or guidance regarding loading very large data sets into SQL Server.  I have been using the In-Database tools.  My issue is that recently I loaded 3.4 billion records (about 240GB) into one table in my SQL Server Database that took 30 hours and 15 minutes to load.  Before I started my load I had put the data into the Alteryx DB format (YXDB) so my process only included the loading into the table.

 

Now, is there a faster process or approach?  Staging?  Any help or suggestions.

 

Thanks!

paul_houghton
11 - Bolide

Are you using the SQL Server Bulk Loader option in your InDB connection? If so, that would help.

The biggest delay then I would think is the network transfer time from your machine (where the yxdb currently) to where the SQL is located. are you able to check with the performance recorder if that is the case?

 

(performance recorder option is in the workflow configuration > Runtime tab) 

 

2018-02-16_13-31-42.jpg

dshaw
8 - Asteroid

Thanks for pointing these out; however, where is the bulk loading option?

paul_houghton
11 - Bolide

When you are setting up the InDB connection, jump over to the "Write" option choose from the dropdown list.

 

2018-02-16_14-49-23.jpg

Claje
14 - Magnetar

It's also worth noting that this transaction could take awhile depending on how fast your connection is, even without any SQL Server load issues.

 

For instance, loading 240 GB of data at 10 Mbps will take about 24.5 hours all on its own, without applying any logic.  Have you performed this type of transaction using another tool, and how long did it take?

dshaw
8 - Asteroid

Do you know what the Transaction Size under Option is?  Mine has defaulted to 10,000, but I am beginning to wonder if this affects my write speed.  Recently, I ran two tests writing data to SQL Server.  The first test, I used the In-DB tool to write 204 million records which completed in about 1:58 minutes.  The second test I used the regular input tool to load the data and the process completed in less than 1 hour.  

 

So, I am puzzled by this as I could clearly see faster transfer speed with the non-In-DB option.  Any thoughts?  Will a change in the transaction size matter? If so, what should it be?

 

IN_DB Write Connection Options.png

dshaw
8 - Asteroid

Hi Paul et al:  I may have finally been able to optimize my settings.  First, after spending some time with an Alteryx Solutions Engineer, it appears that my driver settings were incorrectly configured although I selected SQL Bulk Loader.  Actually, I think my connection string was the culprit.  So, once I was able to change that I was actually able to take advantage of the SQL Bulk settings.  So, now my connection string now shows ssvb DSN=HC****** Bulk Loader.

 

Furthermore, I changed the Transaction size from the default of 10,000 shown below to 30,000,000. I arrived at the 30,000,000 by taking a suggested estimated formula of [2 GB / (the total bytes size of my schema)].  Now, there is still some debate on if the numerator should 2 GB or 100 GB.  In any regard, after making all these suggested changes I was able to complete a load that previously took 30+ hours down to 1 hour and 36 minutes for 240 GB data or about 3.4 billion records.

 

So, I think I am heading in the right direction.  I hope this helps others who may have similar questions about the Write driver for SQL Server Bulk Loader and the Transaction Size.

@paul_houghton

@Claje

Labels