Hello,
I'd like to join two tables together and match based on if the names are connected by a nickname, however, the only way I have thought to do this explodes the number of records I am submitting which makes the match unreasonably long. For context, I have millions of records on one side and hundreds of millions on the other. I am employing a batch macro to submit records one batch at a time, by zip code, which is making this manageable without nicknames.
In my first file (File A) I have the name Bill and in my second file (File B) I have the name William. I have a third file which contains a list of all nicknames available. This tells me that William, Will, Willy, Willie, Billy, and Bill are all nicknames of one another. My initial though was to take my record in File B and manufacture additional records with each nickname. Then, when I perform the join a match will occur on the manufactured File B record with the nickname that matches. The problem is that will so many records and so many names this increases the number of records I have by a magnitude.
I need to employ similar techniques for when First and Last name are swapped (create a second record with the Frst/Last swapped if the last name is a common first name). Also, hyphenations: If a last name is hyphenated I need to create an additional 2 records, one with the first portion of the hyphenation and one with the second.
Any ideas would be great!
Have you explored the fuzzy matching capabilities of Alteryx? You can match exactly on certain fields and "Fuzzy match" on others based on certain matching logic. One of those capabilities includes leveraging nicknames and even allows for you to add your own set.
https://help.alteryx.com/current/designer/fuzzy-match-tool
https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-Fuzzy-Match/ta-p/45485
User | Count |
---|---|
106 | |
82 | |
72 | |
54 | |
40 |