Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Poor data quality mapping names

Sammy22
8 - Asteroid

Hi,

 

i have an official list of product names like on the left hand side of the table below. I was doing a join to another data set to get additional product information but the file i am using has very poor data quality for the names. Many of the names have other characters in them. I have put examples in the right hand side below of how the names can appear. I need a way to clean the name column in the file to match the names in the original product name file. Would appreciate some help!

 

Product Official Name Product Name Variations   
ABC 1500 ABC 1500-3/1.85ABC_1500-5/1.45ABC_1500 ProductABC_1500 Product 123/23
Bread and Eggs 1301 Bread and Eggs BE Bread and Eggs (BU400)Packet BE 200 
2 REPLIES 2
Ben_H
11 - Bolide

Hi @Sammy22,

 

This task is always going to be somewhat difficult, and there are a number of approaches you can take but you'll never get to a magic one size fits all solution.

 

I'm assuming there could be much more variation that what we see in your sample?

 

Creating rules in regex would be pretty complex even just looking at how the two example rows are below.

 

You could try fuzzy match - although I'll be honest it's not something I tend to go for.

 

A third alternative that could get you partly there is Find/Replace.

 

For the most part (in your sample) the variant names do contain the official name, so you could use the find replace tool to join them together, and then have a look at the exceptions in a more manual fashion.

 

I've attached a simple example of what I mean to give you some ideas.

Ben_H_0-1628264608925.png

 

Hope that helps a bit at least!

 

Regards,

 

Ben

 

 

 

Treyson
13 - Pulsar
13 - Pulsar

hello @Sammy22,

 

Dynamic Rename is an amazing tool that people don't often know about. I have attached a workflow that does what you are asking for, but it will require some management on your side. If you only had a variation on the value of "Bread and Eggs" with under scores or more characters, I would say let's fuzzy match them, however since you have examples where BE means Bread and Eggs, we have to get creative. 

 

Play around with what you see here and let me know if you have any questions.

Treyson Marks
Senior Analytics Engineer
Labels