Fuzzy Match and Replace Close Matching User Names
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Back again for more help.
I have a list of billing users that is as follows:
As you can see, there are some spelling "differences" in their data. "Denis" and "Dennis" are the same person as well as "Devon Poisson" and "Devon Possions"; "Dillion" and "Dylan Brown".
I've come to the conclusion that I could fuzzy match the users names into groups and find and replace using those groups. My issue seems to be in the fuzzy match.
I do a fuzzy match on the data set with the following configuration:
What I get as a result is:
Where am I going wrong Alteryx..... Save me...... I know it's somewhere close hahaha
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The Fuzzy Matching tool will work for you, but there is prep that needs to happen before using the tool.
First, you will need a Record ID field which right now you are using just your Name field. Add A Record ID tool before the Fuzzy Matching tool. 2nd, you will want to make sure that you have removed any extraneous characters or words to improve the matching algorithms. The 3rd thing is that you do not have any Advanced Options selected so the tool will result in what you see. The Fuzzy Matching tool needs the Record ID and at the very least the Match Score for you to see what matched. The tool will not change your data, the tool is there to alert you which data match and where it is. You can then use the record ID to match against your original data.
Resources
https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-Fuzzy-Match/ta-p/45485
https://help.alteryx.com/9.5/index.htm#FuzzyMatch.htm?Highlight=fuzzy%20matching
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @DanM!
Alright I did see the recordID needing to be used in all other examples.
I based my logic off the example shown here:
So I've now reworked myself with the RecordID
Looking at the Configuration for FuzzyMatch, I would assume I need the following:
Purge Mode:
RecordID Field = RecordID
Field Name = Name
Match Sytle = Name
Looking at Advance Options, I can see the match score and such, but what I want is to get it in group format:
How would I now get all "Vincent Williams" as a group so that I could use the find and replace against all versions of "Vincent Willia(m)" are transformed into "Vincent Williams"
I hope that is clear as mud ;)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@JordyMicheal take a look at the Multi-Row Formula tool. You can do grouping within that tool and that should help you pull out. Before you do that though, as a best practice I would filter out your 100% matches since you don't needs those and it will make it easier to look at.
https://help.alteryx.com/current/MultiRowFormula.htm?Highlight=multi%20row
You have a One Tool Sample/Preparation Tools of this tool under HELP that shows some of it's capabilities.
DanM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I got the desired result doing the following:
I used Fuzzy Match without a RecordID as shown below:
From there I used a make group with the 1st Key being the original name list, and the 2nd Key being the generated listed (Name2).
This output me a Group and Key field that matched all the names in the key field, into a group of similar names:
Then I simply used a find and replace tool with the "Find" being my original data-set and the replace being the "Make Group" data-set.
Really neat tool, just took some digging into.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
It would be great if you could attach the workflow. I have a simillar question but i'm missing a part of the solution.
