The data I got is similar to the sample I attached below.
It has a field called Skill name that captures people's skills. There are many skills that are similar, I want to bundle them into broader skill baskets. For example, maybe "Ad Exchanges", "Ad Serving", "Advertising", "Advertising Operations" and "display Advertising" can be automatically fuzzy matched into one skill set. My struggle is "Ad Exchange" and "Ad Serving" only have 2 letters that match each other. How should I configure the fuzzy match function and let the skills match to their closest skill sets?
Solved! Go to Solution.
From my personal experience when working on these kind of problems, which I have on a number of occasions for our clients, fuzzy matching never really produces the results that they want, not because it's a poor tool but because they expect such specific matches.
I would perhaps look at whether you can build a rule based methodology using simple calculations to merge your values into their different groups.
For example, you could create a list of key terms associated with the different groups, and then see if a field contains one of those specific. These groupings can be stored in a file that can be easily edited by all of your stakeholders. This gives your stakeholders a very clear way of merging values together (which fuzzy matching doesn't), whilst also giving your stakeholders the ability to control the responses they see.
Ben
I agree with the above - while fuzzy matching can be great for identifying values that are similar, it's definitely not the only and/or best solution long term. This is especially true given that some of your values are similar by only 2 letters.
I suggest making a list of the skills and categories they belong to. To get a consolidated list of all skills, you can use a unique tool with the U output. From there, you can create your master list of skills and categories. Once you have that, you can use a join to add the appropriate category.
Moving forward, anything that has a previously mapped skill will come out of the J output. Any skill that has not been previously mapped will come out of the L output, and you can add to the master list.
Thank you for the suggestions! It helps a lot.