I’ve been working with the Auto Field tool a lot lately. Recently, I learned how to remove columns with zeros or nulls from Alteryx workflows using the Auto Field tool, and I want to show you how easy it is.
I receive data from service providers in CSV format. The CSVs and the columns they contain differ by vendor, and they also contain a lot of noise and unnecessary columns. I needed to combine them all into a single table, so I built a workflow merging the CSV files. As expected, my final table contained over 400 columns. Many of those columns contain only zero or null values—in which case, I don’t need the column at all. However, I have no way of knowing which columns will be null or zero, so I needed a programmatic way to remove them. I will use the Auto Field and Dynamic Select tools to perform this task.
The screenshot below shows my entire workflow. I’ve highlighted the relevant tools in red. In order, I use a Multi-Field Formula tool, Auto Field tool, Select tool (skip the DateTime tool), Dynamic Select tool, and a Multi-Field Formula tool. The Select tool at the end is optional just to see how data types have changed.
Many folks wonder why I wouldn’t just use the Data Cleansing tool, and the answer is simple. If the column name changes or if new text columns are added, the Data Cleansing tool won’t process them. It doesn’t have the option for Dynamic of Unknown Fields like the Auto Field does. Basically, I’ve spent so much time reworking Data Cleansing tools that I have stopped using them entirely.
The Auto Field tool works by reading through all records, and for a given column, it sets the field type to the smallest possible size based on the data contained in the column. While that doesn’t seem helpful, what you might not know is that if all of the values are null, it will set the data type to Boolean. From there, you can use the Dynamic Select tool to remove all Boolean data type fields. And that will programmatically remove all columns where the values are all null. To also include zeros in this process, review the detailed steps below.
There are five steps in this process, and the configuration for each tool is shown below with an explanation.
The Multi-Field formula tool converts zero values to null. They need to be null for the Auto Field tool to do its thing. Now, notice that I have selected all text fields and that my expression accounts for zero in a few different ways. Why did I do that? Why didn’t I convert these columns to numbers first to make this easier and more foolproof?
I had to do it this way for two reasons. First, when reading data from CSVs, Alteryx imports all columns as String data types. I left them as strings because, as you’ll see in the screenshot below, the Auto Field tool only converts string data types. If I changed the data type to numbers, the process wouldn’t work.
Then, use the Auto Field tool on all columns….and I have a lot of columns. (See how it says “Select String Fields to Auto Change Field Type”).
Next, the Select tool after the Auto Field tool confirms the strings converted to numbers. If a column contained only null values, the data type was converted to a Boolean field.
Now, I configured the Dynamic Select tool to select all field types except Boolean so that the Boolean columns will be removed.
Lastly, I use another Multi-Field Formula tool, this time selecting only Numeric fields, to replace all nulls with zeros. Since the fields are numeric, my zero is no longer in quotes.
Of course, if you have valid Boolean data type columns, you’ll want to perform this process in a separate stream of the workflow where they are not included. If you don’t have any Boolean data type columns, this will work perfectly.
And that is how you can programmatically remove columns with zeros or nulls from an Alteryx workflow. In my next article, I’ll explain how I used a File Directory tool and a Batch macro to merge all those CSVs with different schemas.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.