Fuzzy matching a name within a paragraph
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello Everyone,
I need some assistance with a fuzzy matching I am trying to do.
I've done some standard fuzzy matches on name and addresses but am struggling to do a match between two files based on a 'name' value separated by a vertical bar.
1st file: Names
2nd file: Information containing the 'Name' but is separated by vertical bars.
I am trying to do a fuzzy match to produce a match score based off the 'name' in both files. In the second file, I bolded 'Cisco' and 'Tripex'.
One of the files has 5 million rows...
Any help is appreciated!
- Labels:
- Fuzzy Match
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @mzsweetumz
I feel fuzzy match might not be the best tool this (then again its only me 😅 maybe others have better suggestions)
I would approach this with a find and replace where I would lookup and check whether Manufacturer info contains Manufacturer Name. Here is a learning resource on find and replace tool.
https://community.alteryx.com/t5/Interactive-Lessons/VLookUps-with-Designer/ta-p/80201
Give it a try if you are facing any issues let us know.
Hope this helps : )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @mzsweetumz - I think that you don't need Fuzzy Matching for this specific use case. Find Replace will do a better job for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @atcodedog05 and @ArtApa ,
Thank you! The find and replace works wonders and is a really cool tool that I did not know about.
However, I am still stumped - as I didn't give enough background info on my two files. I am trying to match two files to find which 'responses' has the business 'name' in it.
The file with the business 'names' has 5 million rows... and the file with the business 'responses' only has 150 rows.
I am trying to match the business name and business responses associated with a person (their name), but that's not something I can join on.
This output looks right, however, I don't think it is as it's not linked to the same person.
any help is appreciated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @mzsweetumz - Can you please show how a desired output should look like?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @ArtApa , @atcodedog05
Something like this.
I have done other fuzzy matches for different match types and will eventually union it all into one big spreadsheet.
