Fuzzy Match Custom Configurations
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi all,
I've been trying to custom set a Fuzzy Match for an "Address" field with all kinds of variations mainly tweaking the "Generate Keys", "Match Function", and "Match Thresholds" modules .
I have created two examples with one being too strict not including an Address match that it should have, while another is too lenient, appearing to even include a match just based on the Address Number.
The too lenient one, if I increase the Match threshold to say 90%, it would actually remove a legitimate match (1234 w main st).
I am looking at some tips or maybe there's a different type of combo that will help resolve this and find that happy medium. I am not an expert at all using say "Word" vs. "Character" and Jaro vs. Levenshtein.
As an FYI, all these addresses have been hashed in a way for mock purposes.
Thanks all for your help and time.
Solved! Go to Solution.
- Labels:
- Fuzzy Match
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Bumping it up for any tips on how to best customize a Fuzzy Match or Addresses.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I know Fuzzy Match might require some more specialized knowledge. If anyone has any thoughts, appreciate it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @whitkrieng,
See how I would do it in the attached WF. There it should work with your examples, you could check if it also works well for the correct data.
Otherwise, if you further want to improve it, I can recommend a blogpost i wrote to you: https://www.thedataschool.co.uk/frederik-egervari/understanding-fuzzy-logic.
This might help you, as it explanes what the different algorithms do and how the keys are generated, but it might require a lot of thought and time to really improve the results.
Overall Fuzzy Matching gets super complicated when you want to do it well and is often not worth the effort.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@FrederikE Thank you so much for your response and your time.
I noticed you clicked "None" for Generate Keys, how does that improve the match? You mentioned in the blog post, the Generating Key step, will then be skipped. It seems a MatchKey is still generated from your workflow:
One thing that didn't appear to match was the Main Street record, the other two were correct:
1234 w main st to 1234 w main street apt 999
Maybe this is just how it is? You will never be able to get it perfect?
Thanks again for your time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @whitkrieng,
Yes, seems like keys are generated and not used then. The thing with keys is that they function better when the start of a string is more important then the end, which doesn't seem to be the best fit for your case.
If we talk about perfection, Fuzzy Matching is probably the wrong way to go, since the differences between the strings don't follow a perfect pattern. It's always more of a 80/20 Tool.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@FrederikE Thank you again for your response, I went ahead with your recommendations. What do you recommend as a solution that is perfect outside Fuzzy Match? Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I guess the only better solution (results-wise) is to build a lookup table and then join the information to the original data. But of course this can lead to enourmous amounts of work (depending on the size of your tables). The Fuzzy Matching results may be a good starting point for building such a lookup table, to decrease the amount of work required a bit.
