on 08-29-2016 03:55 PM - edited on 07-27-2021 11:37 PM by APIUserOpsDM
This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools.Here we’ll delve into uses of the Data CleansingToolon our way to mastering the Alteryx Designer:
You've gotten your long dataset and you want to combine it with another dataset for additional information. Your dataset is nice and clean. Everything is formatted the same, no null values... The whole package. You open up the data to join to and right away you see a ton of clean up that needs to happen: nulls to replace, strings to format appropriately, extra characters, white space, the list goes on. You launch the Designer, and while fast and accurate, you have to set up a new Multi-Field Formula Tool for each situation you need to fix. If only there was a single tool that did it all.
If this describes your experiences with dirty data, you're in luck. Take a look at the Data Cleansing Tool in the preparation tool set. This tool, released with v10.5 of the Designer, will allow you to easily:
Some of the clean up options depend on the field type, for instance, when replacing nulls if it's a string field it'll replace it with a blank value whereas if it's numeric it'll return a 0 value. Either way, this should make your data clean up easier.
Better still, this tool is a macro! Macros can be opened to see what's inside (right click on the tool icon, choose "open macro: Cleanse.yxmc") to see exactly how it's doing it's magic. Is there something you have to do with your datasets regularly that this tool doesn't do? Add it! You can add your own custom clean up to this macro so you can keep your one tool clean up in tact. If you do add your own features, be sure to save the macro with a different name so you don't overwrite your existing tool.
Take a look at the attached sample workflow (built in 10.5) demonstrating the "old way" and the new way of cleansing data!
By now, you should have expert-level proficiency with the Data Cleansing Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at Community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.
Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.
Great article Andrew, thanks for posting.
One item that I'm struggling with how to get the tool to identify new fields automatically and apply the settings.
The use case I have is a monthly process where the new month field (YYYY_MM) is added and I am currently manually selecting the check box each month to apply the tool setting.
Any recommendations?
Thanks,
Geoff
I don't like to replace null string fields with 0s. Here are the reason: if the field (sale value) is truly 0 and someone forgot to capture the zero sales, then it would be ok. However, if the field (sale value) is null because one did not have information on the actual sales, then put 0s in the fields would be miss leading when doing analysis. I'd prefer to offer another way to capture the situation when we don't have knowledge on actual values of the fields such as a period "." indicating missing values.
I need to select half of the fields in my data set which contains A LOT of fields. Is there a way to select multiple fields at once instead of checking each one?
Is there a way to dynamically select the check boxes in the cleansing tool for new columns that are being added to each days result set?
Or is there a workflow that could address all the nulls coming in each day??
Why do we need to cleanse Null?
The selection always get cleared and have to re-select again when I edited some tools before this. It cannot keep what has been configured like in the Select tool. This caused me some problems.
Great article Andrew. Thanks for sharing!
Is there video of this tool to learn in video.
In Data Cleansing tool, Need to understand _ Tabs, Line Breaks, and Duplicate Whitespace & All Whitespace difference with an example, can someone please help me. Thanks in advance.