Hi Everyone,
I have a list of names (last name, first) from two different reports. List 1 has names and ID. List 2 has names and days worked. The list with names and days worked has different naming variations for the same name. Ideally, I would want to create a table that has List 1 names, List 1 ID, and List 2 days worked. How would I go about this? Here is some fake data that we can use:
List 1
Name ID
Potter, Harry James 10001
Granger-Weasley, Hermione 10002
Weasley, Ronald 10003
Malfoy, Draco 10004
Potter, Ginny Weasley 10005
List 2
Name Days Worked
Potter, Harry 5
POTTER, HARRY 6
Potter, Harry J. 10
Potter, Harry James 7
Granger, Hermione 12
GRANGER, HERMIONE 10
Granger-Weasley, Hermione 10
Weasley, Ron 5
Weasley, Ronald 7
WEASLEY, RON 8
Malfoy, Draco 3
Malfoy, Drake 9
MALFOY, DRACO 3
Potter, Ginny 5
Weasley, Ginny 7
Potter, Ginny Weasley 2
WEASLEY, GINNY 7
Potter, Ginevra 4
Weasley, Ginevra Molly 5
Expected Table Result Format
Name from List 1 ID from List 1 Days Worked from List 2
Thanks in advance for your help!
Solved! Go to Solution.
You might want to use tools other than fuzzy match. Consider a join on the fields. Of course you'd want to create all capitals for both sets of data. These are the easy matches.
you can parse the names and look for exact on first and last as well as a flip flop. There's no reason why 3 join's can't be used.
you might even break up hyphens or use a find replace to help too. In your example below I could build a join solution to find all names. Depending on the data, it might suffice. The problem occurs with common names. If two John Smiths work for you, then you're in trouble. But fuzzy matching will be just as much of a headache. JJ Phillips could exist as John Jody Phillips. Good luck with that.
If you look for Scott Jones at alteryx, are you seeking @scottj or Scott R Jones?
cheers,
mark
Hi @TrumanLam ,
I'm attaching an example showing how to do it with fuzzy match.
As @MarqueeCrew said and I totally agree,
fuzzy matching will be just as much of a headache.
I had some problems with one name and I wasn't able to build a method to get them all. But 95% of the job is done and I leave you with the worst 5%of it. 🙂
Best,
Fernando Vizcaino
Not only will @TrumanLam need to take it further with these wizard names, but when he uses real data he'll need to time this too. Fuzzy matching is an art.
you were very kind to get him started on a workflow.
cheers,
mark
Thank you so much for the workflow @fmvizcaino !
This will help me get started on my workflow with actual data. This is my first time using fuzzy data, so I appreciate your workflow in helping me get started.
@MarqueeCrew, thanks also for your input on this. While I understand that doing joins would be able to achieve a similar result, The actual dataset I am playing with has over 3,000 unique people, so using fuzzy matching would help to differentiate the bulk of the people. As for the users with similar names (i.e. John Smith), I would do a separate analysis to differentiate them, such as requesting for additional information in List 2, such as email.