Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
FreeRangeDingo
11 - Bolide
11 - Bolide

I love that I can run huge amounts of data through Alteryx. But, as you know, Alteryx is never the final destination.  Alteryx generated data gets pulled into BI applications like Spotfire or PowerBI. Smaller data sets load faster than larger data sets, and when it comes to data loading, faster is better. Now, you may think you can’t do anything to reduce the size of your data set without cutting columns or rows, but that’s not true. This post will share two tips to reduce the size of your Alteryx output and increase the speed of data loading in BI applications.

 

My Workflow

 

For this post, we’ll work with a small set of tools, and I want to draw attention to the size of the output. If you don’t see the size when running your own workflows, click on the canvas, go to the Canvas tab and choose “Show” or “Show Only When Running” from Connection Progress.

 

FreeRangeDingo_0-1655922542555.png

 

Just after my input tool, I use a Select tool to rename and change data types. As you can see, 1.3 GB of data exit the tool. Then, I perform a calculation that increases the size of the data set from 1.3 GB to 1.5 GB. Next, let’s see what I do to cut that down.

 

FreeRangeDingo_1-1655922542568.png

 

Tip Number 1 – Trimming

 

First, there are several tools you can use with the Trim function to remove whitespace from Text fields, which will reduce the size of the output.  The Formula tool, the Data Cleansing tool, and the Multi-Field formula tool will all work.  My strong preference is to use the Multi-Field Formula tool. The Multi-Field formula tool allows you to apply the Trim function to all Text columns, including Dynamic or Unknown fields.  That’s why it’s my preference.  The Formula tool must be applied one column at a time, so it’s inefficient. The Data Cleansing loses columns when they are renamed and doesn’t allow you to select Unknown fields, and the Multi-Field formula tool does. 

 

The screenshot below shows an example of the configuration. 

 

FreeRangeDingo_2-1655922542590.png

 

So, trimming whitespace takes .1 GB off the size of my data set. I don’t know if I would call .1GB material, but it certainly is helpful.  To get more material gains, let’s look at the Auto Field tool. 

 

Tip Number 2 – Auto Field

 

The Auto Field tool is easily overlooked by new users but has multiple uses.  First, it reads through all the records, and for a given column, it sets the field size to the smallest possible size based on data in the column.

 

FreeRangeDingo_3-1655922542599.png

 

In my workflow, you can see 1.4 GB of data flows into the Auto Field tool, but only 936 MB of data flows out. The tool shaved off almost half a gig just by shortening the field size.  That’s significant!

 

FreeRangeDingo_4-1655922542608.png

 

You can see the precise difference in field sizes by adding Select tools just before and after the Auto Field tool. Compare them to see how they have shrunk!

 

FreeRangeDingo_5-1655922542621.png

 

FreeRangeDingo_6-1655922542633.png

 

Now, you know two ways to reduce the size of data sets in Alteryx!

Comments
Druke5315
5 - Atom

I haven't even thought about using the multi-field tool to clean text columns - may need to implement that through some of my processes. Even though I've been using Alteryx for 3 years, I have def overlooked the auto-field tool. Does that tool act dynamically for changes in the data that is provided; always going to the smallest possible size?

Cheers,
Druke

Kaustubh17
8 - Asteroid

@FreeRangeDingo thanks for the wonderful article that will help us a lot to reduce the size of data. 

 

Regards,

Kaustubh

FreeRangeDingo
11 - Bolide
11 - Bolide

@Druke5315 yes, the auto field will work dynamically for all new data and new columns.  That's one of the reasons I like it so much. It always goes to smallest possible.  I use it at the start of all my workflows to make my development as fast as possible and at the end to make the final product as small as possible.

IraWatt
17 - Castor
17 - Castor

Very cool I never thought of using the auto field to optimize workflows !  thanks for the article.