Hi Alteryx Community,
When I write wide columns of data to SQL Server via ODBC, Alteryx converts anything greater than 1020 characters to varchar(max) instead of varchar(1021). Column sizes are important in my database, and the write process becomes SO slow when writing varchar(max). One workflow writing <100 mb of data took 7 hours to run, but when I forced the columns to be <1000 characters, it ran in 40 minutes. I usually don't have the flexibility to just ignore columns >1000 characters. I read on the community that the cutoff is supposed to be around 8000 characters.
Does anyone know why this might be happening and/or how I can prevent it?
My workaround for the resulting column size is to use PostSQL to change the column type back to the exact size, but my workflows are still writing data so. slowly.
This is my ODBC connection: odbc:DRIVER={ODBC Driver 17 for SQL Server}
I generated test columns to see where the cutoff was. It's around 1020-1025 characters, as seen in these screenshots. I attached my workflow to generate the sample columns.
Thank you! I really appreciate your time and insight.
Have you thought about using Bulk load instead of ODBC? This is significantly faster than a traditional ODBC connection. It doesn't address your column sizes question directly, but you may want to give it a try to see if it makes a big difference.
Hi @BrandonB,
Thanks for getting back to me! I'm pretty new to database connections in Alteryx. I just did some quick research and still am unclear what the difference is between ODBC, OleDB, and Bulk load. Would you mind explaining or sharing resources where I can learn the differences and use cases?
I see the option to output as SQL Server bulk in the output data tool. However, I don't think I have it configured correctly because 'none available' is the only option.
My organization is comfortable with SQL Server databases but brand new to Alteryx, and I'm Alteryx advanced certified but only a few weeks into Alteryx database connections. They're gradually willing to make changes if I can explain in detail exactly what changes should be made and why said change is worth risking (I don't have admin permission). I'm very interested to learn about the performance differences in the database connections! That being said, a more immediate fix would ideally involve our current database settings or ODBC connection.
Thanks again!
Of course! So a Bulk Connection will use an ODBC connection for the database information, but the actual write process is a bit different. For most databases it is similar to a COPY command. If you are seeing "No ODBC Connections Found" you will want to click the "ODBC Admin" button on the bottom left to set up a DSN for your SQL Server. Once you set this up you should see that the ODBC connection is selectable and you can choose it from this menu. That's basically it! Then you will be writing up to tables like you were before, but doing so significantly faster.
As for ODBC vs OLEDB vs OCI vs Bulk, these are basically just different ways that your computer can communicate with a database. You will see ODBC most commonly in organizations, but OCI is used for connections to Oracle and sometimes OLEDB is used for various reasons. Bulk connections are useful for most applications where the driver supports it. You can find more information on whether or not bulk output is supported by a specific source at https://help.alteryx.com/20212/designer/data-sources
Very helpful explanation-- thank you! I got the bulk connection working, so I'm interested to see if it's any faster on my next large data load. I tried it with my test workflow, and unfortunately I got the same results with columns >1020 characters being converted to varchar(max).
Yep I wasn't sure if it would be helpful with the column sizes, but it will almost certainly improve your load times! Please let me know if this makes things any better for you.
User | Count |
---|---|
19 | |
15 | |
13 | |
9 | |
8 |