I love that I can run huge amounts of data through Alteryx. But, as you know, Alteryx is never the final destination. Alteryx generated data gets pulled into BI applications like Spotfire or PowerBI. Smaller data sets load faster than larger data sets, and when it comes to data loading, faster is better. Now, you may think you can’t do anything to reduce the size of your data set without cutting columns or rows, but that’s not true. This post will share two tips to reduce the size of your Alteryx output and increase the speed of data loading in BI applications.
For this post, we’ll work with a small set of tools, and I want to draw attention to the size of the output. If you don’t see the size when running your own workflows, click on the canvas, go to the Canvas tab and choose “Show” or “Show Only When Running” from Connection Progress.
Just after my input tool, I use a Select tool to rename and change data types. As you can see, 1.3 GB of data exit the tool. Then, I perform a calculation that increases the size of the data set from 1.3 GB to 1.5 GB. Next, let’s see what I do to cut that down.
Tip Number 1 – Trimming
First, there are several tools you can use with the Trim function to remove whitespace from Text fields, which will reduce the size of the output. The Formula tool, the Data Cleansing tool, and the Multi-Field formula tool will all work. My strong preference is to use the Multi-Field Formula tool. The Multi-Field formula tool allows you to apply the Trim function to all Text columns, including Dynamic or Unknown fields. That’s why it’s my preference. The Formula tool must be applied one column at a time, so it’s inefficient. The Data Cleansing loses columns when they are renamed and doesn’t allow you to select Unknown fields, and the Multi-Field formula tool does.
The screenshot below shows an example of the configuration.
So, trimming whitespace takes .1 GB off the size of my data set. I don’t know if I would call .1GB material, but it certainly is helpful. To get more material gains, let’s look at the Auto Field tool.
Tip Number 2 – Auto Field
The Auto Field tool is easily overlooked by new users but has multiple uses. First, it reads through all the records, and for a given column, it sets the field size to the smallest possible size based on data in the column.
In my workflow, you can see 1.4 GB of data flows into the Auto Field tool, but only 936 MB of data flows out. The tool shaved off almost half a gig just by shortening the field size. That’s significant!
You can see the precise difference in field sizes by adding Select tools just before and after the Auto Field tool. Compare them to see how they have shrunk!
Now, you know two ways to reduce the size of data sets in Alteryx!