Hello, I'm a new to Alteryx and still learning the tool, can you help me with following task?
I have a list of the data divided into categories and subcategories (each row have its unique ID), which I want compare to the given list of categories and subcategories. The task is to check if a pair of category and subcategory match the list (that is easy to do with Join tool), but if there is no exact match available, then, Based on the exact match of the categories, suggest the closest match for subcategory if there is any similar value in the reference table.
Additionally, it is nice to have the evaluation how close the best match is to the given value (i.e. 90%)
The sample list is here:
ID | Category | Subcategory |
1 | Classic | Symphony |
2 | Jazz | Acid Jazz |
3 | Rock | Death and Roll |
4 | Rock | Hard Rok |
5 | Jazz | Jazz |
The table with reference data:
Category | Subcategory |
Classic | Chamber |
Classic | Opera |
Classic | Symphonic |
Jazz | Acid |
Jazz | Classic |
Jazz | Fusion |
Jazz | Modern |
Rock | Death 'n' Roll |
Rock | Hard Rock |
Desired output:
ID | Category | Subcategory | Closest match (Subcategory) |
1 | Classic | Symphony | Symphonic |
2 | Jazz | Acid Jazz | Acid |
3 | Rock | Death and Roll | Death 'n' Roll |
4 | Rock | Hard Rok | Hard Rock |
5 | Jazz | Jazz |
I thought about fuzzy match tool, but don't know how to properly configure it. Or maybe is is not the right tool for the task?
Solved! Go to Solution.
@kwieto
Just want you to know, this is a bit stretchy. 😁
Aside from Fuzzy Match, I also used a filter tool to identify the case for "Acid Jazz" and "Jazz".
Great, that was fast 😀
I assume that if there will be a spelling error in the "Acid Jazz" (i.e. Acit Jazz), the filter won't work and the solution would be more tricky to do?
@kwieto
I think you are right.
Text Similarity is always tricky I think, unless you invovlve something big like Machine Learning, that would be out of my reach.
An article discussing about the Levenshtein distance you may want to take a look.
https://towardsdatascience.com/text-similarity-w-levenshtein-distance-in-python-2f7478986e75
Thanks, I'll look into the article, for now I'm trying to do some changes to the logic with various effects.
Sorry for late reply, got stuck with some other things.