(I apologize if this question has already been asked, but if it was, I was not able to find it. I am still pretty new to Alteryx.)
My supervisor and I have some workflows that query our transaction lines table, which numbers in the tens of millions of records, so we are trying to optimize the performance of those queries.
If we wanted to query all transactions in 2018, it seems like we could do it in one of three ways:
- Use (a) the "Input Data" tool to select the table, (b) the "Select" tool to select the fields we need, and (c) the "Filter" tool to filter by transaction date.
- Use (a) the "Input Data" tool to select both the table and fields we need by using either the "Visual Query Builder" or "SQL Editor" and (b) the "Filter" tool to filter by transaction date.
- Use just the "SQL Editor" in the "Input Data" tool to do all selecting or filtering using FROM or HAVING clauses.
(Typically, we would also join the transaction lines table with other tables, but for this question, I am really just interested in the input process.)
Of the three options, it seems like #3 might be the fastest from a computation perspective, but it is also the slowest and most tedious from a user perspective, even one who is relatively skilled with SELECT only SQL queries. #1 is clearly the quickest and easiest from a user's standpoint.
I guess my questions are...
- Is #3 really the fastest? The only thing that makes me think it might not be is that when I run a workflow, I often see several tools "activate" at once, which makes me think that upstream and downstream tools interact and do not have a strict finish-to-start relationship.
- If so, how much faster might #3 be on a data set that includes around 20M records and 100 or so fields? If it is just a small time difference, it is probably not worth the time it takes to set up.
- Does Alteryx have a way of profiling the performance of different options (e.g., Tableau's performance recorder)?
Thanks in advance for any help you can provide!