We’ve extended Inspire Early Bird Pricing until March 1. Register now and enjoy 20% off conference passes and 10% off training passes. P.S. Don’t forget to bring friends! When you sign up for five or more tickets, you get an extra 20% discount on conference passes. Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy matching issue

pommycho
6 - Meteoroid

I'm having a very weird issue with fuzzy matching, which goes against everything I know about fuzzy matching.

 

I have narrowed down my issue to two entries that should be matching but aren't. I can get them to match through the method shown here: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Fuzzy-Fiasco/m-p/325203#M59117 but it doesn't make sense that they aren't matching to begin with.

 

Can anybody help me understand why if I have:

Row ID

WordGroup
1GST ExpenseA
2GST ExpensesB

 

I can't get the fuzzy match tool to return these two as matching? The configuration that makes most sense to me is:

Fuzzy Config.PNG

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

But this doesn't return a match. I'd like to understand a bit better why it doesn't.

 

Thanks!

5 REPLIES 5
Thableaus
17 - Castor
17 - Castor

Hi @pommycho 

 

At least one field must have the generate keys option checked. It's a mandatory filter in the Fuzzy Match process. I see yours is not checked.

 

Also, could you share configuration of the tool (not only the method you're using)?

 

Cheers,

pommycho
6 - Meteoroid

Hi @Thableaus

 

The process does run since I'm generating keys for the entire word (as shown in the dropdown). I don't want to generate keys for each word (the checkbox I think you mentioned). Since the workflow runs without issues, I'm assuming that my configuration is valid, and if I run it with more values, it does return some matches.

 

Here is the rest of the tool configuration:

Fuzzy Config2.PNG

 

And this is the result:

Fuzzy Result.PNG

 

 

 

 

 

 

 

 

My question is, why isn't the fuzzy matching tool using that "MatchKey" shown on the table and applying Levenshtein's Distance calculation to give me a match?

 

I hope this clarifies this question a bit.

Thableaus
17 - Castor
17 - Castor

@pommycho 

 

The fact is that Fuzzy Match won't throw any errors even if it's badly configured. It's just going to return Null values (or no matches).

 

The Key generation is a part of the whole filtering process to get a match. It's a step apart  from the Fuzzy Match itself.

I know it can sound a bit complicated, but this Live Training will be able to clarify all your doubts on this matter.

 

So, if you don't select the generation key box, and you have only 1 field being used, you'll never get to the Fuzzy match part (Levenshtein's Distance calculation). Like I said, it's a filtering process well explained by @CailinS in the Live Training session.

FuzzyFilter.png

 

Changing this Key Generation configuration, I did manage to find the result you were expecting:

 

GenerateKey.PNG

 

I hope this can help you find a way to your problem.

 

Cheers,

CailinS
Alteryx
Alteryx

@Thableaus glad you found the training illuminating!

 

@pommycho In short, the fuzzy match process isn't running because the keys don't match exactly. Shorter keys are generally the answer in this scenario (super simplified advice, but it works in most cases). The key generation process is really intended to narrow down the number of pairs that have to be fuzzy matched (very processing intensive) and if it is filtering too many, then shorter keys allow more potential matches to pass through to fuzzy matching.

Cailin Swingle
Customer Success & Services
pommycho
6 - Meteoroid

@CailinS, that is very insightful. My hope was that the keys generated were going to be used to compute the Levenshtein Distance but that is not the case.

 

@Thableaus, I understand your point and I have tried running it with generating keys for every word, the issue I'm running into is that the keys generated are GST|AKSPNS and GST|AKSPNSS. Since the second key doesn't match, it only gives me a match for GST. Here is where more context on the rest of the words that are being matched is relevant. Many of the other words that I need to match are of the form _____ Expenses, which matches the second key but not the first. Similarly, there are other words that have GST in them. So, sifting through the matches and better ranking them is a huge part of post-processing the results.

 

Clearly, as mentioned by Cailin on the live training (quoting a community user), fuzzy matching is an art. I just wasn't clear about the keys needing to be an exact match.

 

Thanks to the both of you, this has helped me a lot!

Labels