Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Formula to trim and clean field

Sammy22
8 - Asteroid

Hi,

 

i have an official list of product names like on the left hand side of the table below. I was doing a join to another data set to get additional product information but the file i am using has very poor data quality for the names. Many of the names have other characters in them. I have put examples in the right hand side below of how the names can appear. I need a way to clean the name column in the file to match the names in the original product name file. Would appreciate some help!

 

Product Official Name

 

Product Name Variations

 

 

 

ABC 1500

 

ABC 1500-3/1.85

ABC_1500-5/1.45

ABC_1500 Product

ABC_1500 Product 123/23

Bread and Eggs

 

1301 Bread and Eggs BE 

Bread and Eggs (BU400)

Packet BE 200

 

4 REPLIES 4
Luke_C
17 - Castor
17 - Castor

Hi @Sammy22 

 

For ABC 1500 you can use a replace formula to substitute a space in for underscores. I would then use the find and replace's append functionality to see what matches - see below example. The last record didn't match, I'm not sure if you want to start tracking a mapping of abbreviations that can be subbed in. If you do go that route you could take a similar approach with the find and replace tool.

 

 

Luke_C_0-1628256153100.png

 

Sammy22
8 - Asteroid

Thanks, How would i manage things like ABC1200 and ABC.1200?

Luke_C
17 - Castor
17 - Castor

I'd probably use a data cleanse tool to remove whitespace and punctuation. Then have a mapping of the actual product name to use for the find and replace. Take a look at the attached, will require some set-up but should cover most of your cases.

 

Luke_C_0-1628258062182.png

 

john_watkins
11 - Bolide

As a general rule, it is not optimal to use names of things for joins as it leads to scenarios as you mention where spellings, punctuation and capitalization are different between all systems.  If it is completely out of your control or if things get really messy you can always create a mapping table.  This isn't a quick answer, but it can be used in other areas as well.  YOu can have your workflow check that there are no "new" combinations.   Again this all depends on how much data you are talking about.

 

ABC.123         = ID 1234  

Ab C?1  2   3_ = ID 1234

 

Then create your table where you have one-and-only one record for that ID:

ID 1234    NAME   ABC123

Labels