Greetings,
I'm trying to fuzzy match between two fields on a row level in a data set of matching pairs that were prepared on a specific detailed criteria.
Solved! Go to Solution.
Hi @MAISKHADER,
I've put together the fuzzy matching workflow I would use which should give you what you want. You may need to tweak the actual settings but this will give you a comparison between the two (in this example including the debt levels in case these fluctuate, but you can simply remove them from the comparison field if you like).
What I've done is split the two sets to streams and given them an origin source (Source/Target) then concatenated the names with matching debts. These fields have then been fuzzy matched using characters and digits and the match score output as required:
I've attached the workflow for you to tweak as required.
Hope this helps.
Hello @mceleavey
I appreciate your help. Unfortunately, the previous workflow doesn't compare only on the same row. As for example, if we added another record with the ID "11" the result will be comparing the 11th record with the sixth too.
My team found another workaround though that seems to be working fine. Explained in this picture:
I hope it is helpful.
Thanks alot.
An amazing workaround. Thanks a lot
I had a similar requirement, and wrote a couple of macros to do this, now posted on the public gallery.
One is "LevenshteinDistance", which works out the Levenshtein Distance between string fields on the same row, ie the number of edits required to change from one string to the other. (I also have an Optimal String Alignment version that accounts for transpositions, but not yet put that on the gallery).
The other macro is "JaroWinkler", which works out the Jaro Similarity, Jaro-Winkler Similarity and the Jaro-Winkler Distance. This give you score values for the level of similarity or difference between two strings, and you can then use those scores to decide what to do with your data.
If anyone uses these, I would appreciate feedback about any errors or problems encountered.
Super helpful workaround for row-level comparisons! Saves a great deal of processing as well
Thank you both