Alteryx Designer Desktop Discussions

Sammy22 · ‎08-06-2021

Hi,

i have an official list of product names like on the left hand side of the table below. I was doing a join to another data set to get additional product information but the file i am using has very poor data quality for the names. Many of the names have other characters in them. I have put examples in the right hand side below of how the names can appear. I need a way to clean the name column in the file to match the names in the original product name file. Would appreciate some help!

Product Official Name	Product Name Variations
ABC 1500	ABC 1500-3/1.85	ABC_1500-5/1.45	ABC_1500 Product	ABC_1500 Product 123/23
Bread and Eggs	1301 Bread and Eggs BE	Bread and Eggs (BU400)	Packet BE 200

Luke_C · ‎08-06-2021

Hi @Sammy22

For ABC 1500 you can use a replace formula to substitute a space in for underscores. I would then use the find and replace's append functionality to see what matches - see below example. The last record didn't match, I'm not sure if you want to start tracking a mapping of abbreviations that can be subbed in. If you do go that route you could take a similar approach with the find and replace tool.

Sammy22 · ‎08-06-2021

Thanks, How would i manage things like ABC1200 and ABC.1200?

Luke_C · ‎08-06-2021

I'd probably use a data cleanse tool to remove whitespace and punctuation. Then have a mapping of the actual product name to use for the find and replace. Take a look at the attached, will require some set-up but should cover most of your cases.

john_watkins · ‎08-06-2021

As a general rule, it is not optimal to use names of things for joins as it leads to scenarios as you mention where spellings, punctuation and capitalization are different between all systems. If it is completely out of your control or if things get really messy you can always create a mapping table. This isn't a quick answer, but it can be used in other areas as well. YOu can have your workflow check that there are no "new" combinations. Again this all depends on how much data you are talking about.

ABC.123 = ID 1234

Ab C?1 2 3_ = ID 1234

Then create your table where you have one-and-only one record for that ID:

ID 1234 NAME ABC123

Alteryx Designer Desktop Discussions

Formula to trim and clean field