Hello everyone
I have tried to find a solution but haven't suceeded so I apologize if this is a simple common problem.
File A
Has 2 columns showing (A) keyword searched on a website (this column contains unique values) and (B) a classification (which in this case has repeated values)
This classication has been manually done and is a list of 300,000 classified keywords.
Example (see File A attached )
keyword - classification
bed - furniture
tv - electrical appliances
tomb raider - games
bread - food
beer - drinks
table - furniture
File B
File B contains a single column showing a list of unique keywords that haven't been classified yet.
File C
File C is a list of expected classifications. That is a list of unique values containing all the possible classifications for the "classifications column. I havent' provided this here, but in another tool (PowerBI) this third file was a requirement to use FuzzyMatch.
Goal
The Goal is to use the large sample of classified keywords in File A to automatically classify the keywords in File B into a new column (column B, just like in file A).
The result will be a file with 2 columns. In column A you should have all the keywords from File B and in column B the classification.
Some error is expected, but after some visual validation this result will be joined with File A, enriching the list of classified keywords for future use of this project.
What I have tried so far
I have tried to use Fuzzy Match and Groups, but all the examples that I have found weren't similar to mine.
I hope I was able to explain it in a clearly way annd i truly appreciate if someone can give me a hand
Luis
Hi @luismarc - I have attached a workflow that takes an alternate approach to fuzzy matching. Fuzzy matching can be a great tool but I've often found it has it's limitations, particularly when you are comparing a single column of data in which many of the values are a single word. You often have to set the match threshold very high - even leveraging the phonetic conversions - and the results aren't always helpful. Rather, I usually have more success with simple manipulations. In this case, I have used 3:
You could obviously continue to build on this. Hope this is helpful!