(I apologize if this question has already been asked, but if it was, I was not able to find it. I am still pretty new to Alteryx.)
My supervisor and I have some workflows that query our transaction lines table, which numbers in the tens of millions of records, so we are trying to optimize the performance of those queries.
If we wanted to query all transactions in 2018, it seems like we could do it in one of three ways:
(Typically, we would also join the transaction lines table with other tables, but for this question, I am really just interested in the input process.)
Of the three options, it seems like #3 might be the fastest from a computation perspective, but it is also the slowest and most tedious from a user perspective, even one who is relatively skilled with SELECT only SQL queries. #1 is clearly the quickest and easiest from a user's standpoint.
I guess my questions are...
Thanks in advance for any help you can provide!
Solved! Go to Solution.
First off, there is performance profiling available. In the workflow's Configuration window, there's a tab for "Runtime". At the bottom of that section is a checkbox to select Performance Profiling. Here's an article with more information.
https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Performance-Profiling-in-v10/ta-p/3800
Now to the original question, #3 should be the fastest option. I try to imagine data moving through the workflow: each time it passed through a tool, it has to be processed and moved in/out of memory, and so we should reduce that. In option #1, then data has to be moved from the table, to the select tool, then to the filter tool where it is reduced. Option #3 is the least amount of data movement that has to be processed/held in memory to achieve the same goal.
Other thoughts:
- It's my understanding that the green borders/"activation" can be a pretty rough estimate of what's going on in the engine processing. It's there for user feedback, but probably shouldn't be relied on. EDIT: If you ever wonder why some non-connected tools start before others, it has to do with the ToolID of those tools.
- When considering the "set up" vs "letting it all process" decision, I think it's important to consider if the field you need to filter on is indexed on the database. If it is, then use a WHERE clause in the query and take advantage of that index. Otherwise the filter tool might not be a big performance increase unless there are machine hardware differences limiting performance (Server host vs Alteryx host).