Alteryx Designer Desktop Discussions

rnick821 · ‎10-24-2024

Hello, great community!

I'm still relatively new on my Alteryx journey, but I find myself using it almost daily now to assist my team in managing data projects. It's a fantastic tool and I hope to become more familiar with it as time goes on.

As you will quickly learn as you read the rest of my post, I am not a trained SQL type...I am a former musician who is completely self-taught through the auspices of websites such as Stack Overflow and Coursera.

But...I have a question that I hope the community can help me with.

I am working on a workflow that pulls data from various tables located in two separate databases co-located on the same server (this part really isn't important, but I want to include for context). At first, using the knowledge received from my Core certification, I was just using the standard "Input Data" (green) tool to create the data connection, but because some of the database tables have millions of records, that wasn't working too good. Our company's resident Alteryx expert (who will likely see this post) showed me the In-Database tools (which I was unfamiliar with) so I started using those.

I built out the workflow, but I noticed that it was taking like 15 minutes to run...I used custom SQL to limit the # of records coming across and it was still very very slow. So, I began to experiment by the various steps of the workflow into containers and turning them on and off to try to isolate where the slowness was coming from and narrowed it down to one query that was taking 13 minutes on average where the rest of the queries were taking 15-20 seconds. It was just a "Connect In-DB" tool followed by a "Data Stream Out" tool.

So...I began to look at that data and ran the query through SQL Server Management studio where it was rendering the 276,000 records in about 4 seconds...so I could not figure out why it was taking so long in Alteryx.

Poking...prodding...poking...prodding...finally I ran some in SSMS select(max(len( type queries against the columns I was querying for to try to figure out the length of each column and then adjusted my custom SQL in Alteryx to do SELECT(LEFT(columnname,xx)) for each column in my query to only pull over the approximate number of characters for each column.

This fixed it! It went from 13 minutes to about 15 seconds.

So, I poked and prodded more and began to look at the table itself on the DB server...turns out that when the DB owner created the table, he made pretty much every non-date and non-numeric column a NVARCHAR(4000) as the data type. Not sure why...maybe he was in a hurry...but regardless, I began to suspect the culprit.

So...I have two questions:

#1 - Would the data type for the 5 of 7 columns in my query being NVARCHAR(4000) slow things down, even though the max # of characters in ANY of the columns was like 250. From my prodding, it seems like the "Connect In-DB" step didn't take much time, but the "Data Stream Out" took like 99% of the 13 minutes for the step to complete.

#2 - If #1 above is true, is there any setting in Alteryx Desktop itself that would allow me to compensate for that without having to write a custom query with something like SELECT(LEFT(columnname,xx)) for each column?

Thanks for any thoughts.

Best,

Robert

Hsandness · ‎10-24-2024

@rnick821 from what I've read on Stack OverFlow, ChatGPT and the Microsoft website (char and varchar (Transact-SQL) - SQL Server | Microsoft Learn), a column with datatype NVARCHAR(4000) won't slow your query down because if there are blank values then SQLServer ignores those values. It's apparently more costly to explicitly limit the number of characters with a column like CHAR(50) since SQLServer needs to allocate memory for the 50 bytes stored in that column.

I don't think there's an issue with the query itself since it's running fast in SQLServer and the Connect In-DB tool. I'm guessing the slow processing time has something to do with the hardware in your computer. I run queries with the Data Stream Out tool that have 15+ columns and 80,000+ rows which run in under 20 seconds. I have 32GB of RAM and don't experience much latency in my workflows.

You could try using a virtual machine that has more RAM than your computer and see if there's a change in performance.

FYI I'm not an expert on any of the above topics but am extremely bored at work right now and like to trouble shoot issues on the community page during these times. So, I wouldn't spend too much time investigating if it's the RAM, it's more of a hunch I have.

rnick821 · ‎10-24-2024

Thanks. Ironically enough, my company today was kind enough to supply me with a brand new laptop with a higher processor and more RAM. I have not yet tried it with the new computer yet, but will take it for a test drive tomorrow.

Even though you say you're not an expert, I'm guessing you're more than an expert than I... :)

Raj · ‎10-25-2024

@rnick821

It's impressive how you've taken such a hands-on approach to troubleshoot your workflow performance! Here’s some insight into your questions:

1. Impact of NVARCHAR(4000) on Speed:
Yes, having columns set to NVARCHAR(4000) can indeed affect performance, especially when using the Data Stream Out tool. When Alteryx streams data out, it has to allocate enough memory for the maximum possible data size. For NVARCHAR(4000), this means allocating space for each column to hold up to 4000 characters, even if actual data is much smaller. This overhead is likely why your workflow speeds up significantly after limiting the column width.

2. Compensating for Large Column Widths Without Custom SQL:
In Alteryx, there’s no native setting in the Data Stream Out tool to automatically trim excessive column widths based on actual content length. So, your approach of using SELECT LEFT(columnname, xx) in custom SQL is effective and one of the best ways to manage this within Alteryx. Alternatively, if you're open to some preliminary work in SQL Server, you could create a view with optimized column lengths to streamline data pulls without editing SQL each time.

This kind of experimentation is crucial for Alteryx, especially when working with large datasets!

hope this helps.

rnick821 · ‎10-25-2024

Thank you for the insight.

Unfortunately, I don't have enough influence with the DBA to create a new view for me, but I appreciate that I'm on the right track to solving this. :)

Best,

Rob

Alteryx Designer Desktop Discussions

In-Database slowness due to table design...? Can someone confirm/deny my hypothesis?

[Sharing] Custom Formula Function 'IsPrime' to Jud...

Re: What’s the best way to save each day’s working...

Re: Alteryx Load Time

Re: Accounting for Zero Records on Email Output

Re: Extract text up to 3rd - from a string