I often give new Alteryx users the same pieces of advice. So I thought why not write an article with those tips? Then even more people can improve and simplify their Alteryx development process.
So here it is— 7 Alteryx best practices to make your life easier.
(A big thank you to my manager, Shalini Polimetla, for teaching me many of these best practices.)
As your Alteryx workflows get more complex, it gets harder to follow the train of logic. Luckily, there are several tools that can help you keep your workflows straightforward:
Image by Author. Example use of containers, comments, and wireless tool connections for workflow organization.
Packaging a workflow is very simple, and it could save the recipient of the workflow a headache too. In Alteryx, simply go to Options → Export Workflow. This will create an Alteryx Package file (.yxzp) which behaves like a zipped file. The great thing about the packaged workflow is that it includes all of the input files needed to run a workflow, so users do not have to search for files or re-do all of the input paths. The zipped folder also includes any supporting macros that are used in the workflow.
This may be the most important tip I have learned over the past year of using Alteryx. Essentially, data joins can get messy. If you are pulling together a variety of data sources, it can be easy to get lost in the joins, and then all of the sudden you have an issue with duplicate records. Or, if you are working with a dataset for the first time, this technique can help you understand the primary keys of your tables and figure out how you should join them together.
Image by Author.
The technique is simple: place a RecordID tool on the left side of the data, and then place a unique tool, with the RecordID field selected, right after the inner join output (J). This will ensure that none of the records in the original data stream (the left side) are duplicated. If you do see records coming out of the duplicate (D) output of the unique tool, it’s important that you understand why this is occurring. It could be that you have not structured your join correctly for your two tables.
If you find yourself duplicating a process in a workflow by copying tools over and over, you likely have an opportunity to simplify your workflow by turning these processes into a batch macro. To learn more about batch macros, take a look at Alteryx’s “Getting Started with Batch Macros” post.
This was an insight I learned from my manager: if you run a workflow without saving first (i.e. it will show as NewWorkflow1), it uses the processing power of the temp drive. If you save and then run, it uses the processing power on your C drive. I have seen this take a 10 minute run time down to 7 minutes or less.
Depending on the size of your dataset, having browse tools can significantly slow down the processing time of your workflow. Once your workflow is in a steady state, you should go back through it, eliminate browse tools, and also re-evaluate whether all tools are necessary.
When I go back through my workflows, if I was under a time crunch to develop them, I will realize that I did not design them in the most efficient way. Or, perhaps the requirements changed over the duration of the project so I can now delete some sections or tools in the workflow that no longer have a purpose.
Image from Alteryx.
In-Database tools allow you to perform data cleaning/data blending activities without moving the data out of the database. This can make your workflow run significantly faster when you are working with large queries.
For a comprehensive overview of In-Database tools, you can refer to this Alteryx documentation page.
Additionally, when I am developing a workflow, I frequently query databases and store the results in a .yxdb (Alteryx database file format). I then use the .yxdb files as inputs to my workflow so that: 1. The workflow runs faster and 2. I am not hitting the database with tons of queries as I re-work and test my process.
That’s all I have for you today on Alteryx best practices, although I am sure there are more that I did not cover. If you have a strategy you have learned from experience, feel free to leave a comment with your expertise!
If you have never used Alteryx before, here are some reasons why I think you should try it.
This article originally appeared on Towards Data Science.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.