Hello all. I am trying to filter out some healthcare data and I am running into some messy data. I have the idea of fuzzy matching the file against one value or a couple different values based on some trends that I have found. Example would be giving me a match score below between the reference and the payor.
Issue is I cant get the Fuzzy Match tool configured to the output I want. Open to any suggestions! Thank!
Reference Column | Payor |
Medicare Advantage | AARP UHC MEDICARE ADVANTAGE PLAN |
Medicare Advantage | AMERIGROUP MEDICARE ADVANTAGE |
Medicare Advantage | ANTHEM MEDICARE ADVANTAGE |
Medicare Advantage | BCBS ADVANTAGE |
Medicare Advantage | AMERI ADVANTAGE |
Solved! Go to Solution.
Could you please give more details about the filter you're doing? What are you trying to match?
Cheers,
Hey @Thableaus
I am trying to filter through a good amount of payers names. Some of which are medicare advantage names which is what I want. The issue is there are about 30 or more different variations of medicare advantage that I want to sift through. Maybe even more. Some of the variations are "mcr advantage" or "mcre advantage. I am trying to avoid making custom filters for each variation and focus more on a match score to expedite the process.
I was playing a little bit with your data and I think we can find a way for your problem.
First of all, I strongly recommend you to use Data Cleansing Tool to standardize your data and get rid of some things that might decrease the match score of Fuzzy Match.
The rest of the things I did was based on this Tool Mastery article.
I split the data in two, since you're comparing against a single string. Using Merge Mode correctly will restrict the comparisons to "Medicare Advantage".
The one thing you should be very aware of is the Match Threshold inside the Match Style configuration. I lowered to 20% so I could see the behavior of all records. Default is 85%, that's probably why you weren't able to see results.
I attach Workflow on version 2018.4. I hope this can give you some ideas on how to deal with your data.
This Live Training also clears up a lot of common problems users run into.
Cheers,
Thanks @Thableaus that pointed me in the right direction
User | Count |
---|---|
19 | |
14 | |
13 | |
9 | |
8 |