Hi, I'm attempting to do a fuzzy lookup with two different files to identify similar names. I'd like to show the similar names by match percentage and then include the sign up date in the final outcome.
List 1
Name |
Bobbie Smith |
Pete Townsend |
Will Ferrell |
Santa Claus |
Freddie Mercury |
Sara Stone |
List 2
Name | Sign Up Date |
Mercury, Fred | 9/5/1955 |
Stone, Lauren S. | 11/27/1972 |
Townsend, Peter | 5/7/1982 |
Smith, Robert | 1/4/1989 |
Ferrell, John W. | 8/2/2005 |
Rabbit, Peter | 10/3/1995 |
Desired Outcome (After Fuzzy Lookup showing matches over 75% (sorted))
Name 1 | Name 2 | Match % | Sign up Date |
Pete Townsend | Townsend, Peter | 95 | 5/7/1982 |
Freddie Mercury | Mercury, Fred | 95 | 9/5/1955 |
Will Ferrell | Ferrell, John W. | 90 | 8/2/2005 |
Bobbie Smith | Smith, Robert | 85 | 1/4/1989 |
Sara Stone | Stone, Lauren S. | 80 | 11/27/1972 |
I tried doing this with the "Union" tool and then "Fuzzy Match", but I'm concerned that the Union tool is causing my outcome to show matches within itself (since the data would be merged/stacked together) and duplicates. I'm very new to using Alteryx and still learning all the different tools.
Okay, I think this example (https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Fuzzy-Match-Merge-Mode-against-two-Dat...) answers my question. I added the record ID and joins and this basically gives me what I was looking for.
When using the "union" feature is it comparing against the two sources or just matching duplicates within itself?
Also, any suggestions on how to improve the fuzzy match for names? Between my two list there are a lot of nick names and middle names / middle initials being used.
Okay the solution seems to work, except I still end up with duplicates in my fuzzy match results. Is there a way to only show the best match instead of several potential matches for a person?