Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

How to detect nearby similar values from a list of entries with manual error.

ToxicBuoy
7 - Meteor

I have a list of some entries as below, Is there a way to create a logic/fuzzy match from where those values which should be similar (like S.No. 1 & 2.... and 3 & 4) could be identified.

To be noted:

1. those words could range and could be any.

2. The difference could be in terms of UpperCase & Lower case as per point 3,4

3. The difference could be in case of position of specail characters after any word like 1(St. , ) & 2(Street,)

 

 

S.No.      Address:

1             153, Biere St. , Texas 590121

2             153, Biere Street, Texas 590121

3             Rucstar Street 18 - 20 Bunden An Per Fuhr 09978 SZ

4             Rucstar St. 18 - 20 bunden an per fuhr 09978 SZ

6 REPLIES 6
DanielG
12 - Quasar

@ToxicBuoy   Do you have the CASS Dataset available to you?  If so, I'd recommend using that for address standardization.  It isnt perfect depending on how bad the data is, but it was a huge help to me previously.

 

Please note it is an added contract cost to get access to that dataset/tools.

Jean-Balteryx
16 - Nebula
16 - Nebula

Hi @ToxicBuoy ,

 

I tried using Fuzzy Matching on your sample and I ended up with this :

 

Capture d’écran 2021-08-05 à 15.15.29.png

 

Workflow attached. Tell me if it's what you are looking for and if yes, test it on a larger dataset to check if it is suitable.

Maskell_Rascal
13 - Pulsar

Hi @ToxicBuoy 

 

Welcome to the wonderfully confusing world of Fuzzy Matching! This solution should work for you.

 

Maskell_Rascal_0-1628171200074.png

 

I am using Fuzzy Match and Make Group to find likely matches in the data set. Then doing a Find/Replace to update the data. From there I do some Joining with the original data input, so I now have two columns and a score showing how closely they match. 

 

I like this approach because it maintains your original data, while showing likely matches and how closely they scored in matching. 

 

If this solution works for you please mark answer as correct, if not let me know!

 

Cheers!

Phil

 

 

ToxicBuoy
7 - Meteor

Hi @Maskell_Rascal , I am finding this much more comfortable. However getting some duplicates. Still evaluating. Thanks for your help

Maskell_Rascal
13 - Pulsar

@ToxicBuoy - any method of fuzzy matching is never going to be 100% accurate, but you can get most of the way there. I typically will run a similar workflow like the one I provided, and then filter to anything with a match score less than 85-90%. That's typically the range where I see some outliers. 

atcodedog05
22 - Nova
22 - Nova

Hi @Maskell_Rascal 

 

Nice method of usage of fuzzy match and make group together. This is one of those workarounds which I have been looking for.

 

This is really helpful thank for the knowledge share 🙂

 

But sad that it could bring match score along with it.

Labels