Alteryx Designer Desktop Discussions

trevorwightman · ‎02-03-2020

Hello,

I am using the below selections for fuzzy matching on a parsed address. Is there a better route to take? Does anyone recommend a better setup? Should I concatenate the address to one field and proceed that way? Any advice would be great, thank you!

fmvizcaino · ‎02-03-2020

Hi @trevorwightman ,

Your configuration seems ok!

One detail about your 'exact' method columns, you need to be careful and guarantee that all columns are written in the same way.

I would suggest you to also think about the weights of the addresspart, maybe that information is not as important as the address and also the thresold should be lower as there is a high possibility of different information representing the same address. So for that, you need a low match threshold and a weight based on how important you think that is.

Best,

Fernando Vizcaino

TomWelgemoed · ‎02-03-2020

Hi @trevorwightman ,

I would suggest starting with less fields and then adding more only when it's not accurate enough or performance is suffering. Personally I found that matching didn't work as well when I added too many fields (probably for same reason as @fmvizcaino mentioned).

A neat trick could be to create a "match key", e.g. first 3 digits of a postcode, first few consonants of the streetname and say the number in the street. That effectively creates a grouping that you can work with and then you can use more specific checks in a formula tool afterwards.

MarqueeCrew · ‎02-03-2020

@trevorwightman ,

My friend @chris_love and I debated the use of fuzzy matching at Inspire. What you're asking is a case where I would recommend NOT using fuzzy matching. There are good alternatives to the fuzzy matching with GOOGLE and HERE. HERE is a bit less expensive (225K queries/month for free). You can send the "un-parsed" address through the API and get PARSED and cleansed data back. This will STANDARDIZE/NORMALIZE your addresses and you can then match via a JOIN.

When I first saw fuzzy matching, I was so impressed with 10 W 100th St. matching to 100 W 10th St. But if both are real and different, do you really want to join on them? When you're matching only on address, I'd imagine that 11107 Manchester Blvd will match to many different addresses. The longer the street name (and more matching fields), the more the individual house numbers will merge. You'd have to break the house number element into it's own field. Then you'll find the units merging.

I used to live on Bajio Rd. I lived on the corner of Bajio Rd and Bajio Ct. These are more fuzzy nightmares for you.

Either use the "free" data available with "effort" via the API or consider buying the address data bundle through Alteryx.

My two cents.

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

TomWelgemoed · ‎02-03-2020

Hi @MarqueeCrew ,

I fully understand your point - I'd been through the same **bleep** - but I suppose that's just the nature of the type of work that you're trying to do. It's not clear-cut and your job is to prevent over-matching, like in the cases you mentioned. This means you might get fewer matches than you'd like for the sake of quality. And I think the fuzzy matching tool can achieve this well. Bear in mind that matching is not always for addresses - it can be for names, companies etc.

So maybe I'm siding with @chris_love on this one 🙂

Best,

Tom

MarqueeCrew · ‎02-04-2020

@TomWelgemoed ,

don't get me wrong. I like a good fuzzy match. When you've got names and addresses, it's great. I prefer names and addresses that have been equally cleansed. I take my ice cold matches through a join and then go do a variety of fuzzy matches.

if you're only matching on address, then caution is needed.

cheers,

mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

Alteryx Designer Desktop Discussions

Selections for Fuzzy Matching on Parsed Address?