Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Finding best (closest) match To the given value from another list

kwieto
8 - Asteroid

Hello, I'm a new to Alteryx and still learning the tool, can you help me with following task?

I have a list of the data divided into categories and subcategories (each row have its unique ID), which I want compare to the given list of categories and subcategories. The task is to check if  a pair of category and subcategory match the list (that is easy to do with Join tool), but if there is no exact match available, then, Based on the exact match of the categories, suggest the closest match for subcategory if there is any similar value in the reference table.

 

Additionally, it is nice to have the evaluation how close the best match is to the given value (i.e. 90%)


The sample list is here:

IDCategorySubcategory
1ClassicSymphony
2

Jazz

Acid Jazz
3RockDeath and Roll
4RockHard Rok
5JazzJazz

 

The table with reference data:

 

CategorySubcategory
ClassicChamber
ClassicOpera
ClassicSymphonic
JazzAcid
JazzClassic
JazzFusion
JazzModern
RockDeath 'n' Roll
RockHard Rock

 

Desired output:

 

IDCategorySubcategoryClosest match (Subcategory)
1ClassicSymphonySymphonic
2JazzAcid JazzAcid
3RockDeath and RollDeath 'n' Roll
4RockHard RokHard Rock
5JazzJazz 

 

 

I thought about fuzzy match tool, but don't know how to properly configure it. Or maybe is is not the right tool for the task?

4 REPLIES 4
Qiu
21 - Polaris
21 - Polaris

@kwieto 
Just want you to know, this is a bit stretchy. 😁
Aside from Fuzzy Match, I also used a filter tool to identify the case for "Acid Jazz" and "Jazz".

0339-kwieto-A.PNG0339-kwieto-B.PNG

kwieto
8 - Asteroid

Great, that was fast 😀

I assume that if there will be a spelling error in the "Acid Jazz" (i.e. Acit Jazz), the filter won't work and the solution would be more tricky to do?

Qiu
21 - Polaris
21 - Polaris

@kwieto 
I think you are right.

Text Similarity is always tricky I think, unless you invovlve something big like Machine Learning, that would be out of my reach.
An article discussing about the Levenshtein distance you may want to take a look.

https://towardsdatascience.com/text-similarity-w-levenshtein-distance-in-python-2f7478986e75

Capture.PNG

kwieto
8 - Asteroid

Thanks, I'll look into the article, for now I'm trying to do some changes to the logic with various effects.

Sorry for late reply, got stuck with some other things. 

Labels