Hi,
i have an official list of product names like on the left hand side of the table below. I was doing a join to another data set to get additional product information but the file i am using has very poor data quality for the names. Many of the names have other characters in them. I have put examples in the right hand side below of how the names can appear. I need a way to clean the name column in the file to match the names in the original product name file. Would appreciate some help!
Product Official Name |
| Product Name Variations |
|
|
|
ABC 1500 |
| ABC 1500-3/1.85 | ABC_1500-5/1.45 | ABC_1500 Product | ABC_1500 Product 123/23 |
Bread and Eggs |
| 1301 Bread and Eggs BE | Bread and Eggs (BU400) | Packet BE 200 |
|
Hi @Sammy22
For ABC 1500 you can use a replace formula to substitute a space in for underscores. I would then use the find and replace's append functionality to see what matches - see below example. The last record didn't match, I'm not sure if you want to start tracking a mapping of abbreviations that can be subbed in. If you do go that route you could take a similar approach with the find and replace tool.
Thanks, How would i manage things like ABC1200 and ABC.1200?
As a general rule, it is not optimal to use names of things for joins as it leads to scenarios as you mention where spellings, punctuation and capitalization are different between all systems. If it is completely out of your control or if things get really messy you can always create a mapping table. This isn't a quick answer, but it can be used in other areas as well. YOu can have your workflow check that there are no "new" combinations. Again this all depends on how much data you are talking about.
ABC.123 = ID 1234
Ab C?1 2 3_ = ID 1234
Then create your table where you have one-and-only one record for that ID:
ID 1234 NAME ABC123