community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.
New Data Science Blog

Check out the latest post: All Models Are Wrong

READ MORE

Tool Mastery | Data Cleansing

Alteryx
Alteryx

Data Cleansing.pngThis article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Data Cleansing Tool on our way to mastering the Alteryx Designer: 

 

You've gotten your long dataset and you want to combine it with another dataset for additional information. Your dataset is nice and clean. Everything is formatted the same, no null values... The whole package. You open up the data to join to and right away you see a ton of clean up that needs to happen: nulls to replace, strings to format appropriately, extra characters, white space, the list goes on. You launch the Designer, and while fast and accurate, you have to set up a new Multi-Field Formula Tool for each situation you need to fix. If only there was a single tool that did it all.

 

If this describes your experiences with dirty data, you're in luck. Take a look at the Data Cleansing Tool in the preparation tool set. This tool, released with v10.5 of the Designer, will allow you to easily:

 

  • Replace nulls
  • Modify case
  • Remove unwanted characters across any fields you select, string or numeric
  • Trim leading/trailing whitespace

 

Some of the clean up options depend on the field type, for instance, when replacing nulls if it's a string field it'll replace it with a blank value whereas if it's numeric it'll return a 0 value. Either way, this should make your data clean up easier.

 

Better still, this tool is a macro! Macros can be opened to see what's inside (right click on the tool icon, choose "open macro: Cleanse.yxmc") to see exactly how it's doing it's magic. Is there something you have to do with your datsets regularly that this tool doesn't do? Add it! You can add your own custom clean up to this macro so you can keep your one tool clean up in tact. If you do add your own features, be sure to save the macro with a different name so you don't overwrite your existing tool.

 

Take a look at the attached sample workflow (built in 10.5) demonstrating the "old way" and the new way of cleansing data! 

 

By now, you should have expert-level proficiency with the Data Cleansing Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at Community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.

 

Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.

Comments
Meteor

Great article Andrew, thanks for posting.

One item that I'm struggling with how to get the tool to identify new fields automatically and apply the settings.

The use case I have is a monthly process where the new month field (YYYY_MM) is added and I am currently manually selecting the check box each month to apply the tool setting.

 

image.png

 

Any recommendations?

Thanks,

Geoff 

Atom

I don't like to replace null string fields with 0s. Here are the reason: if the field (sale value) is truly 0 and someone forgot to capture the zero sales, then it would be ok. However, if the field (sale value) is null because one did not have information on the actual sales, then put 0s in the fields would be miss leading when doing analysis. I'd prefer to offer another way to capture the situation when we don't have knowledge on actual values of the fields such as a period "." indicating missing values. 

Atom

I need to select half of the fields in my data set which contains A LOT of fields.  Is there a way to select multiple fields at once instead of checking each one?