Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Match on State Abbreviation to Full Name

BobSnyder85
8 - Asteroid

Hi All,

 

I am looking for some guidance on the fuzzy match tool set where if i have 'GA' for one record that it will match with some low % confidence the full name of 'Georgia'. I figure the 2 out of 7 characters match and that would be worth more than PA to Georgia. 

 

I know this is an odd request and methodology isn't 100% but this would help me in several areas of my project and further my understanding of Fuzzy Match options. 

 

I've been toying around with the different match styles and keys and whatnot. I thought Alphanumeric Only for Keys would give me some % match but it is 100% rejected. (thresholds are at 0) 

 

Any ideas?

 

Thank in advance!

5 REPLIES 5
SeanAdams
17 - Castor
17 - Castor

Hey @BobSnyder85

 

are you looking to do this for states specifically?   Reason for asking is that the fuzzy match tool does have different match methods to specifically address different types of data.

 

2017-06-02_9-41-39.png

 

The other option you have, if you're just looking to fix the GA-> Georgia; VA-> Virginia etc - is to do a lookup table for the 51 states & districts using a replace function?

 

BobSnyder85
8 - Asteroid

Hi Sean,

 

Thanks for the reply!

 

It is not state specific and the data will be world wide. That is just an example. Postal Code is another one that I am currently having issue matching something like "02909" doesnt match with "2909". I can't make it 'Digit's Only Reverse' due to the global nature of this report. Other countries don't just use Numbers for the zip code. 

 

This is why I would like to know if I can make fuzzy logic check the order and character similarity of '02909' to '2909'. I would even score that with a high probability match, but i have this example right now and cannot get the postal code to flow through.

SeanAdams
17 - Castor
17 - Castor

Hey Bob,

 

I spent some time working through this and was not able to get a good fuzzy match hit between the 02909 and 2909 as you mentioned.   the obvious idea, to clean up zip codes by looking for 5 digits and left-padding them, won't work because of the international post code challenge you have (where post-codes in different geographies not only have different lengths, but also some have alpha like UK postcodes).

 

I don't know of an easy way through this one Bob - my instinct would be to try to look up these addresses using a geolocation or mapping service, and check where you get similar locations, but not sure if that would work for your data.

 

 

 

 

BobSnyder85
8 - Asteroid

Hi Sean,

 

Thanks for looking further into this and your help. 

 

I suppose my question is more simply stated. forget that its a zip code. How can i match ex: 06260 to 6260 with some degree of confidence with the tools provided in alteryx? 

BobSnyder85
8 - Asteroid

Hi All,

 

I have solved the issue through trial and error and continuing to learn about fuzzy match and the intricacies. 

 

Short version: You don't have to have the top and bottom portion of the Fuzzy Match tool completed. You only need to have 1 variable that matches for a Key Match. This turned out to be the trick that did it for me. I have a "set" of data being compiled by my first run through a fuzzy match, i then do an exact match type on those known variables (SourceID & MatchCandidate ID). The fuzzy match occurs using "Character - Levenshtein Distance" for the field in question (Postal Code in example). I have attached a picture to help illustrate my options.

 

Source ID & MC ID i have relabeled for my sanity but were originally named "Record ID1" & "Record ID2".

 

Basically, I take all the records concerned, pass them through a generic Name field only match. this returns a set of data that i want to further scrutenize, so i manipulate the data and output to go back with the match candidates from Name and individually run them through a 'lite' fuzzy match process and return a score, that score then gets joined back to the original data. 

 

If you know anything about fuzzy matches, you know that it works down the columns, not across rows. I selected my original output twice and unioned that output, then mixed the fields on the union, assigned a "TempRecordID" for the lite fuzzy match process and rejoin on ONE side since they were duplicated then rejoin that output using the Make Group, this is why records go from 5,002 to 10,004 back down to 5,002.

 

Hope it helps!  

 

Labels