Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Row Level Fuzzy Matching

MAISKHADER
6 - Meteoroid

Greetings,

I'm trying to fuzzy match between two fields on a row level in a data set of matching pairs that were prepared on a specific detailed criteria.

After having the matching pairs I'm trying to fuzzy match between those pairs. 
 
Let's say for example:
We already prepared the matching pairs based on the debt amount. And now we want to fuzzy match between those pairs of people.
When trying to use the merge mode to compare between the fields (name 1 & name 2). The fuzzy match will be applied on one cell in comparison of the whole other field. but I only want it to compare to the other cell on the same row -AKA row level- as the data set is tremendous and it takes forever to do it on all of the cells in that field.

matching 1.png

matching 2.png


Is there a workaround to do so ?

Thank you
6 REPLIES 6
mceleavey
17 - Castor
17 - Castor

Hi @MAISKHADER,

 

I've put together the fuzzy matching workflow I would use which should give you what you want. You may need to tweak the actual settings but this will give you a comparison between the two (in this example including the debt levels in case these fluctuate, but you can simply remove them from the comparison field if you like).

What I've done is split the two sets to streams and given them an origin source (Source/Target) then concatenated the names with matching debts. These fields have then been fuzzy matched using characters and digits and the match score output as required:

 

Fuzzy.PNGI've attached the workflow for you to tweak as required.

 

Hope this helps.



Bulien

MAISKHADER
6 - Meteoroid

Hello @mceleavey

I appreciate your help. Unfortunately, the previous workflow doesn't compare only on the same row. As for example, if we added another record with the ID "11" the result will be comparing the 11th record with the sixth too.

record 11.png

My team found another workaround though that seems to be working fine. Explained in this picture:

Correct sol..jpg

I hope it is helpful.
Thanks alot. 

ignas
8 - Asteroid

An amazing workaround. Thanks a lot

Hiblet
10 - Fireball

I had a similar requirement, and wrote a couple of macros to do this, now posted on the public gallery. 

 

One is "LevenshteinDistance", which works out the Levenshtein Distance between string fields on the same row, ie the number of edits required to change from one string to the other. (I also have an Optimal String Alignment version that accounts for transpositions, but not yet put that on the gallery).

 

The other macro is "JaroWinkler", which works out the Jaro Similarity, Jaro-Winkler Similarity and the Jaro-Winkler Distance.  This give you score values for the level of similarity or difference between two strings, and you can then use those scores to decide what to do with your data.

 

If anyone uses these, I would appreciate feedback about any errors or problems encountered.

mkincannon1
5 - Atom

Super helpful workaround for row-level comparisons! Saves a great deal of processing as well 

isokando
7 - Meteor

Thank you both

Labels