Alteryx Designer Desktop Discussions

ToxicBuoy · ‎08-05-2021

I have a list of some entries as below, Is there a way to create a logic/fuzzy match from where those values which should be similar (like S.No. 1 & 2.... and 3 & 4) could be identified.

To be noted:

1. those words could range and could be any.

2. The difference could be in terms of UpperCase & Lower case as per point 3,4

3. The difference could be in case of position of specail characters after any word like 1(St. , ) & 2(Street,)

S.No. Address:

1 153, Biere St. , Texas 590121

2 153, Biere Street, Texas 590121

3 Rucstar Street 18 - 20 Bunden An Per Fuhr 09978 SZ

4 Rucstar St. 18 - 20 bunden an per fuhr 09978 SZ

DanielG · ‎08-05-2021

@ToxicBuoy Do you have the CASS Dataset available to you? If so, I'd recommend using that for address standardization. It isnt perfect depending on how bad the data is, but it was a huge help to me previously.

Please note it is an added contract cost to get access to that dataset/tools.

Jean-Balteryx · ‎08-05-2021

Hi @ToxicBuoy ,

I tried using Fuzzy Matching on your sample and I ended up with this :

Workflow attached. Tell me if it's what you are looking for and if yes, test it on a larger dataset to check if it is suitable.

Maskell_Rascal · ‎08-05-2021

Hi @ToxicBuoy

Welcome to the wonderfully confusing world of Fuzzy Matching! This solution should work for you.

I am using Fuzzy Match and Make Group to find likely matches in the data set. Then doing a Find/Replace to update the data. From there I do some Joining with the original data input, so I now have two columns and a score showing how closely they match.

I like this approach because it maintains your original data, while showing likely matches and how closely they scored in matching.

If this solution works for you please mark answer as correct, if not let me know!

Cheers!

Phil

ToxicBuoy · ‎08-05-2021

Hi @Maskell_Rascal , I am finding this much more comfortable. However getting some duplicates. Still evaluating. Thanks for your help

Maskell_Rascal · ‎08-05-2021

@ToxicBuoy - any method of fuzzy matching is never going to be 100% accurate, but you can get most of the way there. I typically will run a similar workflow like the one I provided, and then filter to anything with a match score less than 85-90%. That's typically the range where I see some outliers.

atcodedog05 · ‎08-05-2021

Hi @Maskell_Rascal

Nice method of usage of fuzzy match and make group together. This is one of those workarounds which I have been looking for.

This is really helpful thank for the knowledge share 🙂

But sad that it could bring match score along with it.

Alteryx Designer Desktop Discussions

How to detect nearby similar values from a list of entries with manual error.