This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
It's the most wonderful time of the year - Santalytics 2020 is here! This year, Santa's workshop needs the help of the Alteryx Community to help get back on track, so head over to the Group Hub for all the info to get started!
I have a list of names (last name, first) from two different reports. List 1 has names and ID. List 2 has names and days worked. The list with names and days worked has different naming variations for the same name. Ideally, I would want to create a table that has List 1 names, List 1 ID, and List 2 days worked. How would I go about this? Here is some fake data that we can use:
Potter, Harry James 10001
Granger-Weasley, Hermione 10002
Weasley, Ronald 10003
Malfoy, Draco 10004
Potter, Ginny Weasley 10005
Name Days Worked
Potter, Harry 5
POTTER, HARRY 6
Potter, Harry J. 10
Potter, Harry James 7
Granger, Hermione 12
GRANGER, HERMIONE 10
Granger-Weasley, Hermione 10
Weasley, Ron 5
Weasley, Ronald 7
WEASLEY, RON 8
Malfoy, Draco 3
Malfoy, Drake 9
MALFOY, DRACO 3
Potter, Ginny 5
Weasley, Ginny 7
Potter, Ginny Weasley 2
WEASLEY, GINNY 7
Potter, Ginevra 4
Weasley, Ginevra Molly 5
Expected Table Result Format
Name from List 1 ID from List 1 Days Worked from List 2
You might want to use tools other than fuzzy match. Consider a join on the fields. Of course you'd want to create all capitals for both sets of data. These are the easy matches.
you can parse the names and look for exact on first and last as well as a flip flop. There's no reason why 3 join's can't be used.
you might even break up hyphens or use a find replace to help too. In your example below I could build a join solution to find all names. Depending on the data, it might suffice. The problem occurs with common names. If two John Smiths work for you, then you're in trouble. But fuzzy matching will be just as much of a headache. JJ Phillips could exist as John Jody Phillips. Good luck with that.
If you look for Scott Jones at alteryx, are you seeking @ScottJ or Scott R Jones?
Alteryx ACE & Top Community Contributor
Chaos reigns within. Repent, reflect and reboot. Order shall return.
This will help me get started on my workflow with actual data. This is my first time using fuzzy data, so I appreciate your workflow in helping me get started.
@MarqueeCrew, thanks also for your input on this. While I understand that doing joins would be able to achieve a similar result, The actual dataset I am playing with has over 3,000 unique people, so using fuzzy matching would help to differentiate the bulk of the people. As for the users with similar names (i.e. John Smith), I would do a separate analysis to differentiate them, such as requesting for additional information in List 2, such as email.