Hello,
I currently struggling how to design the Alteryx workflow.
I'm thinking to use a complicated SQL to extract data, summarize and unpivot the data. But Alteryx also has same tools to do that.
Does anyone try to compare the performance of them?
Let's say >100K rows with >30cols, sum the amount only, 5-6 conditions and unpivot the value column by two group.
Hi Mike,
When working with high-volume, multi-condition datasets, choosing between SQL and Alteryx becomes more than a technical preference—it’s a strategic decision. The analysis below was conducted to evaluate the suitability of SQL versus Alteryx for high-volume data workflows, with the intent of informing stakeholders and aligning with data governance considerations. Hope this helps.
Performance & Efficiency
Aspect | SQL | Alteryx |
Speed | Fast for in-database operations; ideal for large datasets | Slower for wide tables unless optimized (e.g., using OLE DB) |
Unpivoting | Requires complex UNION ALL or PIVOT logic | Simple with Transpose and Cross Tab tools |
Summarization | Efficient with GROUP BY, CASE, and aggregate functions | Intuitive with Summarize tool; performance varies with data size |
Conditions | Precise control with WHERE, CASE, etc. | Visual filters and formula tools; easier for non-coders |
Automation | Needs stored procedures or external schedulers | Built-in scheduling and Gallery deployment options |
Debugging | Harder to trace logic in nested queries | Easier to audit and visualize with workflow canvas |
Strategic Considerations
Tips
When to Use What
Use Case | Best Tool |
Large-scale joins and filtering | SQL |
Quick prototyping and visual workflows | Alteryx |
Complex reshaping (e.g., unpivoting) | Alteryx |
Scheduled ETL pipelines | Alteryx or SQL with stored procedures |
Collaboration with non-coders | Alteryx |
@Mike_Yau A lot of it comes down to preference and scalability. I prefer to use Alteryx as that is what I am more comfortable with but I have a coworker that would prefer to use SQL. I will say, Alteryx is much easier to document, to automate, and to scale as that's what it's designed for. Additionally it allows for way more functionality.
If you do go the route of converting the SQL to an Alteryx workflow, Chatgpt or your AI Chat Bot of choice would help in going through the code and giving suggestions for how to build in Alteryx.
Hope this helps,
Bacon
You could also use both.... and see which suits you better. @abacon is right.
@Mike_Yau this isn't a comparison. What's your database? What's your use case? What's the full infra stack? Are you just running a query to get information your database --- and you have a sql notebook or like dbvisualizer? use sql. are you doing some enrichment with local data?
Alteryx uses sql. It integrates with your database. It allows push down querying via ODBC/In-DB connections. This isn't an either/or situation. Do you see substantial peformance differences querying in Alteryx vs in-notebook? 1) Alteryx on canvas retrieves the entire result in memory. 2) SQL is showing you result and query execution on the db/warehouse vs local retrieval. There can be other reasons.
Many into some, the database gets it done! This is still a basic guide. Anytime you can let the database reduce your data set returned the better off you generally are. There are times when it is convenient to do the work in Alteryx based on tools or just during initial data exploration. Anything you learn in that phase can (filters, select columns, cleansing) can generally then be taken back and converted to a cleaner in-db query that allows more of that to be pushed onto the db.
Each in-db tool is adding a CTE/nested subquery to the SQL that will be fired off to the database. It seems to work best if you minimize your tools. Each tool you add also slows down your interaction with the Designer interface to the point it will become frustrating and nearly unusable. If I had to join tables, selects, filters, etc. I would suggest you write that into your original query on the data pull vs. using all the tools.
Sorry for my not clear question and scenario. I would like to use Alteryx for the whole project. I would like to know, using complicated SQL in "Input Data" tool to perform pivoting/unpivoting and sub-query, or just writing a simple query in the "Input Data" tool when extracting data from DB.
@Mike_Yau --- assumptions:
1) you want to use your data in memory/locally.
2) your local machine is less powerful than your database server.
It is more efficient to offload your query (complicated or otherwise) in SQL in input data or connect in-db or dynamic input-in-db --- and then process only the results in memory vs bringing an entire result into memory and then processing the xml in Alteryx.
This becomes more of an issue as your data scales larger and larger. At 1000-100,000 rows you may not see much an impact. at 10,000,000+ rows this may not be possible to process locally without pushing off some of the filter/aggregation logic to your backend db.
User | Count |
---|---|
107 | |
85 | |
76 | |
54 | |
40 |