Hi,
i have an official list of product names like on the left hand side of the table below. I was doing a join to another data set to get additional product information but the file i am using has very poor data quality for the names. Many of the names have other characters in them. I have put examples in the right hand side below of how the names can appear. I need a way to clean the name column in the file to match the names in the original product name file. Would appreciate some help!
Product Official Name | Product Name Variations | ||||
ABC 1500 | ABC 1500-3/1.85 | ABC_1500-5/1.45 | ABC_1500 Product | ABC_1500 Product 123/23 | |
Bread and Eggs | 1301 Bread and Eggs BE | Bread and Eggs (BU400) | Packet BE 200 |
Hi @Sammy22,
This task is always going to be somewhat difficult, and there are a number of approaches you can take but you'll never get to a magic one size fits all solution.
I'm assuming there could be much more variation that what we see in your sample?
Creating rules in regex would be pretty complex even just looking at how the two example rows are below.
You could try fuzzy match - although I'll be honest it's not something I tend to go for.
A third alternative that could get you partly there is Find/Replace.
For the most part (in your sample) the variant names do contain the official name, so you could use the find replace tool to join them together, and then have a look at the exceptions in a more manual fashion.
I've attached a simple example of what I mean to give you some ideas.
Hope that helps a bit at least!
Regards,
Ben
hello @Sammy22,
Dynamic Rename is an amazing tool that people don't often know about. I have attached a workflow that does what you are asking for, but it will require some management on your side. If you only had a variation on the value of "Bread and Eggs" with under scores or more characters, I would say let's fuzzy match them, however since you have examples where BE means Bread and Eggs, we have to get creative.
Play around with what you see here and let me know if you have any questions.