Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Tool Mastery

Explore a diverse compilation of articles that take an in-depth look at Designer tools.
Become a Tool Master

Learn how you can share your expertise with the Community

LEARN MORE

Tool Mastery | Data Cleansing

AndrewL
Alteryx
Alteryx
Created

Data Cleansing.pngThis article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools.Here we’ll delve into uses of the Data CleansingToolon our way to mastering the Alteryx Designer:

 

You've gotten your long dataset and you want to combine it with another dataset for additional information. Your dataset is nice and clean. Everything is formatted the same, no null values... The whole package. You open up the data to join to and right away you see a ton of clean up that needs to happen: nulls to replace, strings to format appropriately, extra characters, white space, the list goes on. You launch the Designer, and while fast and accurate, you have to set up a new Multi-Field Formula Tool for each situation you need to fix. If only there was a single tool that did it all.

 

If this describes your experiences with dirty data, you're in luck. Take a look at the Data Cleansing Tool in the preparation tool set. This tool, released with v10.5 of the Designer, will allow you to easily:

 

  • Replace nulls
  • Modify case
  • Remove unwanted characters across any fields you select, string or numeric
  • Trim leading/trailing whitespace

 

Some of the clean up options depend on the field type, for instance, when replacing nulls if it's a string field it'll replace it with a blank value whereas if it's numeric it'll return a 0 value. Either way, this should make your data clean up easier.

 

Better still, this tool is a macro! Macros can be opened to see what's inside (right click on the tool icon, choose "open macro: Cleanse.yxmc") to see exactly how it's doing it's magic. Is there something you have to do with your datasets regularly that this tool doesn't do? Add it! You can add your own custom clean up to this macro so you can keep your one tool clean up in tact. If you do add your own features, be sure to save the macro with a different name so you don't overwrite your existing tool.

 

Take a look at the attached sample workflow (built in 10.5) demonstrating the "old way" and the new way of cleansing data!

 

By now, you should have expert-level proficiency with the Data Cleansing Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at Community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.

Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.

Attachments
Comments
GGuenther
7 - Meteor

Great article Andrew, thanks for posting.

One item that I'm struggling with how to get the tool to identify new fields automatically and apply the settings.

The use case I have is a monthly process where the new month field (YYYY_MM) is added and I am currently manually selecting the check box each month to apply the tool setting.

 

image.png

 

Any recommendations?

Thanks,

Geoff 

dsieg
5 - Atom

I don't like to replace null string fields with 0s. Here are the reason: if the field (sale value) is truly 0 and someone forgot to capture the zero sales, then it would be ok. However, if the field (sale value) is null because one did not have information on the actual sales, then put 0s in the fields would be miss leading when doing analysis. I'd prefer to offer another way to capture the situation when we don't have knowledge on actual values of the fields such as a period "." indicating missing values. 

laquino
5 - Atom

I need to select half of the fields in my data set which contains A LOT of fields.  Is there a way to select multiple fields at once instead of checking each one?

RJS
7 - Meteor

Is there a way to dynamically select the check boxes in the cleansing tool for new columns that are being added to each days result set?

Or is there a workflow that could address all the nulls coming in each day??

Jes_Li
7 - Meteor

Why do we need to cleanse Null? 

JokeFun
8 - Asteroid

The selection always get cleared and have to re-select again when I edited some tools before this. It cannot keep what has been configured like in the Select tool. This caused me some problems.

swsieron
5 - Atom

Great article Andrew. Thanks for sharing! 

dhruva
8 - Asteroid

Is there video of this tool to learn in video.

Msarangi
5 - Atom

In Data Cleansing tool, Need to understand _  Tabs, Line Breaks, and Duplicate Whitespace & All Whitespace difference with an example, can someone please help me. Thanks in advance.