I was wondering if anybody could give any insight on how the IN-DB tool differs (on the back-end) to the INPUT TOOL in how it communicates with, say, a SQL Server.
The reason being, I have a query that returns 78,000 lines (very small relatively speaking), with very few joins, and very limited filtering. Basically the query should not be taking a long time to run.
When I run the query on my standard DB client app, it takes ~20 seconds
When I run it in Alteryx using the IN-DB tool, it takes ~20 seconds
When I run it Alteryx using the input tool, it's taking > 1 hour.
The problem is, without going into the weeds, I can't use the IN-DB tool within my Gallery, which is why I have to use an input tool if I want to schedule this process. Does anybody have any idea why this might be happening? In my anecdotal experience, I feel like this happens with other queries as well, whenever I'm hitting this one SQL Server DB (2008, don't ask why our DB is nearly 2 decades old) - so, this slowness is not an issue with just the query, I think it has to do with how Alteryx is communicating with the DB.
Any idea why this might be happening?
Shot in the dark, but I'm thinking that your In-DB connection is using a different ODBC Driver than what you have configured in your standard Input tool.
I'd suggest using the "SQL Server native Client 11.0"
In-DB is executing the query within the database while the Input Data tool is actually transferring the data over the network onto your computer. The In-DB will only move the data onto your computer when the Data Stream Out tool is used.
One recommendation that I do have to try is leveraging an OLEDB connection for your Microsoft SQL Server for the Input Data tool. I have found that for an Input Data tool, an OLEDB connection with the SQL Server Native Client provider selected tends to be significantly faster for Microsoft SQL Server (just in my experience so yours may vary). Although it doesn't help you with your Server In-DB connections, it should do a decent job speeding up your workflows!
Are there any calculated fields in the tables your are using?
I've had an issue in the past where were a field I was pulling was calculated on the fly which caused the process to be very slow with a normal input.
The quick way to test was just remove that field from the query.
Regards,
Ben
Update:
For this particular test workflow that I've built, there are no calculated fields. In fact, the workflow is simply the the input, and then an output - with the difference being one is the input tool, and the other is an in-db input tool + an in-db output tool.
What I'm narrowing down is that I think the slowness is actually coming from the output. That is, when I just extract the data and try to view it, the process runs relatively quickly for either tool. But I've noticed it is when the process is trying to write to a file that takes forever, and I tried writing to a .csv, or an .xlsx and for both processes it is SUPER slow.
I'll add that the slowness comes from either input types - it is really the output tool that is slowing things down, and it's not even the output tool itself. If I have a workflow that uses the In-DB Data Stream Out tool, it will basically hang at that tool even after the input tool has extracted all the data. And this is only happening when I connect to a one particular SQL Server DB.
JRGO, this did it!
For my In-DB input tool, I've been using the "ODBC Driver 17 for SQL Server" which works fine for some of my SQL Server DBs. But for others, it takes forever. And weirder still, is that for some, the data extraction component is fast with that driver, but the data input, into either an external file, or even back to the DB takes a long time.
When I switched to "SQL Server Native Client 11.0" the IN-DB tool ran much quicker!! Thank you.
JRGO, do you have any idea why this is?