Alteryx Designer

Find answers, ask questions, and share expertise about Alteryx Designer.
Register for the upcoming Live Community Q&A Session - and don't forget to submit your questions for @DeanS regarding the future role of analytics here.
SOLVED

Fuzzy Match Last Names Differently from First Names

Highlighted
5 - Atom

Hi Everyone,

 

I have a list of names (last name, first) from two different reports. List 1 has names and ID. List 2 has names and days worked. The list with names and days worked has different naming variations for the same name. Ideally, I would want to create a table that has List 1 names, List 1 ID, and List 2 days worked. How would I go about this? Here is some fake data that we can use:

 

List 1

 

Name                                               ID

Potter, Harry James                        10001

Granger-Weasley, Hermione          10002

Weasley, Ronald                             10003

Malfoy, Draco                                  10004

Potter, Ginny Weasley                     10005

 

 

List 2

 

Name                                               Days Worked

 

Potter, Harry                                    5

POTTER, HARRY                           6

Potter, Harry J.                                10

Potter, Harry James                         7

 

Granger, Hermione                          12

GRANGER, HERMIONE                 10

Granger-Weasley, Hermione           10

 

Weasley, Ron                                   5

Weasley, Ronald                              7

WEASLEY, RON                              8

 

Malfoy, Draco                                   3

Malfoy, Drake                                   9

MALFOY, DRACO                           3

 

Potter, Ginny                                    5

Weasley, Ginny                                7

Potter, Ginny Weasley                     2 

WEASLEY, GINNY                          7

Potter, Ginevra                                4

Weasley, Ginevra Molly                  5

 

Expected Table Result Format

 

Name from List 1             ID from List 1              Days Worked from List 2

 

Thanks in advance for your help!

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

You might want to use tools other than fuzzy match. Consider a join on the fields. Of course you'd want to create all capitals for both sets of data. These are the easy matches. 

you can parse the names and look for exact on first and last as well as a flip flop. There's no reason why 3 join's can't be used. 

you might even break up hyphens or use a find replace to help too. In your example below I could build a join solution to find all names. Depending on the data, it might suffice. The problem occurs with common names. If two John Smiths work for you, then you're in trouble. But fuzzy matching will be just as much of a headache. JJ Phillips could exist as John Jody Phillips. Good luck with that. 

If you look for Scott Jones at alteryx, are you seeking @ScottJ or Scott R Jones?  

cheers,

 

 mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Hi @TrumanLam ,

 

I'm attaching an example showing how to do it with fuzzy match.

As @MarqueeCrew said and I totally agree, 

fuzzy matching will be just as much of a headache. 

I had some problems with one name and I wasn't able to build a method to get them all. But 95% of the job is done and I leave you with the worst 5%of it.

 

Best,

Fernando Vizcaino

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

@fmvizcaino ,

 

Not only will @TrumanLam need to take it further with these wizard names, but when he uses real data he'll need to time this too.  Fuzzy matching is an art. 

you were very kind to get him started on a workflow. 

 

cheers,

 

 mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Highlighted
5 - Atom

Thank you so much for the workflow @fmvizcaino !

 

This will help me get started on my workflow with actual data. This is my first time using fuzzy data, so I appreciate your workflow in helping me get started.

 

@MarqueeCrew, thanks also for your input on this. While I understand that doing joins would be able to achieve a similar result, The actual dataset I am playing with has over 3,000 unique people, so using fuzzy matching would help to differentiate the bulk of the people. As for the users with similar names (i.e. John Smith), I would do a separate analysis to differentiate them, such as requesting for additional information in List 2, such as email.

Labels