Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
There's still time to register for the Q3 ACE Panel - Spatial Edition presentation tomorrow! | Need more information about the event? Check out the post here

Joining or Parsings Names with Hyphens/Middle Initials/3 or 4 words

Highlighted
7 - Meteor

Hello - I am attaching a mock workflow below. I have been having trouble using the join tool to match data from two excel files. The only "identical" column is name, but because the first file has middle names, hyphens. I end up having a lot of fallout in the left and right join tool. There is at least 10,000 rows of names to go through and I tried using the multi-row formula/different ways to parse data, but then it does not split the names correctly. Can someone please help me with this?

 

I want the names to be fully capital at the end of it btw (easily done using the formula tool). However, I not sure which tools of alteryx to use after. I am looking forward to everyone's suggestions!

Highlighted
10 - Fireball

Hi @mifrahm ,

 

There are a lot of ways to go about this but the one I like assumes that the more things you do to get the data to match the more those values should be vetted.

 

In my attached workflow I start creating the general pattern I use with my clients to handle "Fuzzy" matching requirements.

 

I started by keeping all records that match exactly. Then I capitalized the first and last names to check for matches there and unioned the result. The next step I only kept values before the first space in each field and matched those records. You would need to work directly with your data to identify exactly the set of rules you need to properly match everything.

 

Note: you are at no point guaranteed that the mapping is unique so you may replicate data from one source by the other and those records should be individually kept aside for manual review.

 

As far as general techniques you can use string formula that include Replace, Search, Regex_Match, and other functions to include, exclude, or modify what part of each string you want to keep for your comparison.

 

For example: If you REPLACE([Last Name],"-"," ") you will swap hyphens for spaces. 

 

Does that help?

 

Best,

Michael

Highlighted
7 - Meteor

Thank you for the reply! I tried using this version and place appropriately with the data I am using. 

 

I am still a little stuck since in the "output", where I'm referencing the browse tool on the join part. There are so many rows to walk through.

 

Is there any way to simplify the names in this scenario to end up with only one or two rows?  

Highlighted
10 - Fireball

Hi @mifrahm 

 

I think you would need to add 3 more conditions and joins based on what rules you would need to implement to get 6/7 of the second group to match the first. The 7th I don't think matches anything in the first dataset. The workflow that I built is only the foundation for what will be a very large process you will need to build and vet. The upside is that building this map will streamline future maintenance by creating an id that matches records from both sources. Then only new names need to be mapped each time.

 

You need to build each rule and join to account for specific used cases so unfortunately no, you will need to go through the records and identify individual issues with the data.

 

You can try using the Fuzzy Match tool including nicknames on a combined dataset in order to find some matches and you can use formula like REGEX_REPLACE([Name], "[^a-zA-Z\s]","") to remove any characters that are not a through Z or a space but that is only going to take you so far.

 

Best,

Michael

Highlighted
7 - Meteor

Thank you for the insight. I will look into it as I have been stuck on it since a few ways and tried out so many different ways to prevent the least amount of fallout. I am fairly new to alteryx as well so it has been a challenge 

Highlighted
7 - Meteor

On that note @MichaelLaRose  the HR Mock Data I had provided captured the different type of name scenarios I came across when comparing both files. 

 

Open to any other suggestions other have to go about this! 

Highlighted
10 - Fireball

@mifrahm 

 

I can send you a file with a few more matches implement tommorow.

 

Have a good night.

Highlighted
Alteryx
Alteryx

Fuzzy Matching is your friend. I attached a workflow that you can tune to match more or fewer records.

 

ArtApa_0-1594272918871.png

 

You may want to learn more about Fuzzy Matching here: https://community.alteryx.com/t5/Videos/Fuzzy-Matching-for-Beginners/td-p/330575

 

Highlighted
7 - Meteor

Michael - thank you so much I really appreciate your support!!!

Highlighted
7 - Meteor

@ArtApa - thank you for taking the time to create this for me. I will take a look and get back today if I have any questions. I really appreciate your help!!!!

Labels