Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Match Custom Configurations

whitkrieng
8 - Asteroid

Hi all,

I've been trying to custom set a Fuzzy Match for an "Address" field with all kinds of variations mainly tweaking the "Generate Keys", "Match Function", and "Match Thresholds" modules .  

 

I have created two examples with one being too strict not including an Address match that it should have, while another is too lenient, appearing to even include a match just based on the Address Number.  

 

The too lenient one, if I increase the Match threshold to say 90%, it would actually remove a legitimate match (1234 w main st).  

 

I am looking at some tips or maybe there's a different type of combo that will help resolve this and find that happy medium. I am not an expert at all using say "Word" vs. "Character" and Jaro vs. Levenshtein. 

 

As an FYI, all these addresses have been hashed in a way for mock purposes.  

 

Thanks all for your help and time. 

7 REPLIES 7
whitkrieng
8 - Asteroid

Bumping it up for any tips on how to best customize a Fuzzy Match or Addresses.

whitkrieng
8 - Asteroid

I know Fuzzy Match might require some more specialized knowledge.  If anyone has any thoughts, appreciate it. 

FrederikE
13 - Pulsar

Hey @whitkrieng,

 

See how I would do it in the attached WF. There it should work with your examples, you could check if it also works well for the correct data. 

FrederikE_1-1663602865371.png

 

 

Otherwise, if you further want to improve it, I can recommend a blogpost i wrote to you: https://www.thedataschool.co.uk/frederik-egervari/understanding-fuzzy-logic.

This might help you, as it explanes what the different algorithms do and how the keys are generated, but it might require a lot of thought and time to really improve the results. 

Overall Fuzzy Matching gets super complicated when you want to do it well and is often not worth the effort. 

whitkrieng
8 - Asteroid

@FrederikE Thank you so much for your response and your time.  

 

I noticed you clicked "None" for Generate Keys, how does that improve the match? You mentioned in the blog post, the Generating Key step, will then be skipped. It seems a MatchKey is still generated from your workflow: 

 

whitkrieng_0-1663616168094.png

One thing that didn't appear to match was the Main Street record, the other two were correct: 

 

1234 w main st to 1234 w main street apt 999

 

Maybe this is just how it is? You will never be able to get it perfect? 

 

Thanks again for your time!

FrederikE
13 - Pulsar

Hey @whitkrieng,

 

Yes, seems like keys are generated and not used then. The thing with keys is that they function better when the start of a string is more important then the end, which doesn't seem to be the best fit for your case.

 

If we talk about perfection, Fuzzy Matching is probably the wrong way to go, since the differences between the strings don't follow a perfect pattern. It's always more of a 80/20 Tool. 

whitkrieng
8 - Asteroid

@FrederikE Thank you again for your response, I went ahead with your recommendations. What do you recommend as a solution that is perfect outside Fuzzy Match?  Thanks!

FrederikE
13 - Pulsar

@whitkrieng

I guess the only better solution (results-wise) is to build a lookup table and then join the information to the original data. But of course this can lead to enourmous amounts of work (depending on the size of your tables). The Fuzzy Matching results may be a good starting point for building such a lookup table, to decrease the amount of work required a bit.  

Labels