This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Hello - I am attaching a mock workflow below. I have been having trouble using the join tool to match data from two excel files. The only "identical" column is name, but because the first file has middle names, hyphens. I end up having a lot of fallout in the left and right join tool. There is at least 10,000 rows of names to go through and I tried using the multi-row formula/different ways to parse data, but then it does not split the names correctly. Can someone please help me with this?
I want the names to be fully capital at the end of it btw (easily done using the formula tool). However, I not sure which tools of alteryx to use after. I am looking forward to everyone's suggestions!
There are a lot of ways to go about this but the one I like assumes that the more things you do to get the data to match the more those values should be vetted.
In my attached workflow I start creating the general pattern I use with my clients to handle "Fuzzy" matching requirements.
I started by keeping all records that match exactly. Then I capitalized the first and last names to check for matches there and unioned the result. The next step I only kept values before the first space in each field and matched those records. You would need to work directly with your data to identify exactly the set of rules you need to properly match everything.
Note: you are at no point guaranteed that the mapping is unique so you may replicate data from one source by the other and those records should be individually kept aside for manual review.
As far as general techniques you can use string formula that include Replace, Search, Regex_Match, and other functions to include, exclude, or modify what part of each string you want to keep for your comparison.
For example: If you REPLACE([Last Name],"-"," ") you will swap hyphens for spaces.
I think you would need to add 3 more conditions and joins based on what rules you would need to implement to get 6/7 of the second group to match the first. The 7th I don't think matches anything in the first dataset. The workflow that I built is only the foundation for what will be a very large process you will need to build and vet. The upside is that building this map will streamline future maintenance by creating an id that matches records from both sources. Then only new names need to be mapped each time.
You need to build each rule and join to account for specific used cases so unfortunately no, you will need to go through the records and identify individual issues with the data.
You can try using the Fuzzy Match tool including nicknames on a combined dataset in order to find some matches and you can use formula like REGEX_REPLACE([Name], "[^a-zA-Z\s]","") to remove any characters that are not a through Z or a space but that is only going to take you so far.
Thank you for the insight. I will look into it as I have been stuck on it since a few ways and tried out so many different ways to prevent the least amount of fallout. I am fairly new to alteryx as well so it has been a challenge