Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Comparing 2 columns strings have giving percentage scores based on similarity

XIIJAMIEIIX
6 - Meteoroid

Hi Folks,

Hope everything is well

 

I'm dealing with matching some titles that are truncated or suffixed that I need to match internal identifiers with and I'm having issues using fussy matching to get these.

The issue is I can't do overall similarity as I don't want to match the 'Draft' version of documents so it needs to be does the first x characters match, and that's the ENTIRE name of what we have if that makes sense? 

 

Match 1Match 2Percentage
ABCABC12350%
DEFDEF100%
GHIGIH33%
8 REPLIES 8
DataNath
17 - Castor

Hey @XIIJAMIEIIX, hopefully I understood your ask correctly. I can imagine there's a simpler way of doing this but this works for your test data and I believe should scale fine if you want to check it with a wider data set:

 

ddd.png

XIIJAMIEIIX
6 - Meteoroid

Hello @DataNath that almost works! The results are what I was expecting however the source is coming from 2 different CSV and are being matched against each other.

 

Sort of like this (and sorry I completely forgot to mention the 2 CSV's)

 

Match 1 - CSV Input 1Match 2 - CSV Input 2Percentage
ABCABC, 12350%
DEFDEF100%
GHIGIH33%
DataNath
17 - Castor

Hey @XIIJAMIEIIX, how does this updated version look? It just takes the 2 streams and merges them before stripping out any punctuation etc to just leave the letters and digits before running the same comparison. If this still isn't exactly what you're looking for then can you please fully explain and provide better examples to minimise back and forth?

 

FFF.png

XIIJAMIEIIX
6 - Meteoroid

It looks to be matching with totally random records rather than similar matches, I think it might be due to the first join matching on the Record position so it's not aligning items correctly and therefore giving incorrect percentages if that makes sense?

XIIJAMIEIIX
6 - Meteoroid

I'm not too sure if this is an easy way to explain it

 

Match 1 - CSV1Match 2 - CSV2
ABCGIH
DEFABC, 123
GHIDEF

 

Match 1 - CSV1Match 2 - CSV2Percentage
ABCABC, 12350%
DEFDEF100%
GHIGIH33%
jdminton
12 - Quasar

I think you might want to stick to the fuzzy matching tool unless you essentially want to rebuild this. The main reason is that in your examples it makes sense what you're trying to do, but with real data, the characters might be off by one position indicating a zero percent match for an extra character. I.e. the name of a company may be 1 Capital Place in one list and One Capital Place in another list. With what you're trying to do, you will get a zero percent match. With a small list, these may be easy to see, but if you're building automation, my guess is you have a very large list. Have you used the actual fuzzy match tool yet? If so, share what you've got so far, and we can help adjust if necessary.

XIIJAMIEIIX
6 - Meteoroid

Yeah I did try Fuzzy match but it was giving out far too many random hits and wasn't accurate enough so this is why I needed to pivot to something that would look at Character 1 from one list and Character 2 from another, I did take into account grammar and such with normalizing everything with regex to try and improve my matches.

 

We are only caring about a threshold of 90% match with everything that falls under requiring a manual review to attempt to match afterward.

 

so the example I have here one of the values is a 56% match but ideally it should only be 33% as only the first character matches in order

jdminton
12 - Quasar

@XIIJAMIEIIX Try this workflow. I used iterative macros to evaluate letter by letter based on the rules you defined.

Snag_1ec53435.png

Snag_1ec55a4b.png

Labels