Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Match Tool for Data

poojamate92
8 - Asteroid

Hello,

Can you please help to solve and find output for below workflow?

I'm trying to match field from different sources. There is input file contain three sheets sheet 1 and 2 for input and sheet 3 contain sample output. I have to do fuzzy match on field Product Desc 1 and Product Desc 2.

Please find the attachment below for input and workflow:

 

5 REPLIES 5
SophiaF
Alteryx
Alteryx

@poojamate92 - one of the things that is preventing matches is the bottle volume; they aren't in equal conversions. Source1 shows them in centiliters and Source2 shows them in Liters. The fuzzy matching tool looks at the data as strings. Take the following example:

 

CPT MORGAN SPCED GOLD 70CL 35% Captain Morgan Spiced Gold 0.7L

 

We know that 70CL = 0.7L, but as a string, these two are very different:

 

70CL 0.7L

 

This is a very simplistic view of how Fuzzy Matching works (it greatly depends on the method of matching that you choose as well), but it is impacting the match score. Removing the digits prior to putting them into the fuzzy match tool does help in getting some results (I used a Data Cleansing tool before the Fuzzy Match).

 

Perhaps you could build some conversion process into your workflow to account for this? I can't think of another way to get these two strings to match properly, as you will get similar or the same (in the example below) match score:

 

GnXRQY

Sophia Fraticelli
Senior Solutions Architect
Alteryx, Inc.
poojamate92
8 - Asteroid

Hello SophiaF,

 

Thank You.

This post is really helpful to me. I used conversion methods to convert centiliter to Liter and the match was successful. Still I faced problem in associations of match it didn't give me proper match it made association in one to many form as below:

 

output.png

 

 

Please help me out to solve this problem. Please find the workflow attached below:

 

SophiaF
Alteryx
Alteryx

Conversion step looks great - it looks like you may need a little more data cleansing upstream to get some better results. I would probably also do a waterfall fuzzy matching process here, where you start with the 'loosest' fuzzy match, then use that to get more granular and do a subsequent fuzzy match. Here's why:

 

First, I would recommend settings similar to this:

KElG63

 

 

 

I would probably recommend not including the Strip Punctuation if you are able to get rid of the percentages in your data, as those percentages are throwing off the results:

kIOQjX

If you remove those percentages upstream, you can then decide whether or not to leave the punctuation, as it will be useful when comparing volumes, but not useful when comparing values like Mexican Coca Cola, Drink Cola, 350ML (24 Pack). I would do some additional data cleanup to remove just the , (see green comment box in workflow attached).

 

I would also skip generating keys for any common words, such as CANS PACK, as they don't help improve the match process. Since this step only impacts key generation, and not the actual 'Fuzzy Match' piece, it should only help not hurt. I would also lower the max key length to get more keys matching so that more goes through the Fuzzy Matching process.

 

This first Fuzzy Match should give you the most strict matches. You get results like this:RfwIF0

 

 

You will then want to do a second, less strict process, that will account for word order variation (like Diet Coke matching to Coca Cola Diet Coke). Workflow is attached.

 

 

 

Sophia Fraticelli
Senior Solutions Architect
Alteryx, Inc.
poojamate92
8 - Asteroid

Hello SophiaF,

 

Thank You for your response. As you suggested I have tried this example, however it didn't give accurate result. So I made some changes in this but it gives as it is previous output. Please suggest me changes that gives accurate output.

 

Please find the below attached file:

SophiaF
Alteryx
Alteryx

Since you are doing a waterfall approach, you should make the first process more 'strict' by increasing the Match Threshold in the 1st Fuzzy Match process. From looking at the data (as you always need to do some trial and error with the strictness of your match), it looks like 75% gives you correct results. Then, you can try tweaking the strictness of the second fuzzy match tool if necessary.

Sophia Fraticelli
Senior Solutions Architect
Alteryx, Inc.
Labels