Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Match, Make Group

YLYONG
9 - Comet

Hi Alteryx Community,

 

I attempted using Fuzzy Match to extrapolate the nulls using information from the two tables however, it doesn't seem to produce the output that I was intending. 

 

Can't work out what's the missing piece.

 

I have chosen "Words: Best of Jaor & Levenshtein Distance"  for matching Item Description only, and 80% match threshold.

 

Thank you

 

YLYONG_0-1633009226059.png

 

 

ITEMITEM DESCRIPTIONPKSZIN_HGTIN_LENIN_WIDIN_WGTIN_VOLSOURCE
9310640000973GATORADE ARCTIC BLITZ 600ML[Null][Null][Null][Null][Null][Null]BBC
9310640000201GATORADE BLUE BOLT 1L[Null][Null][Null][Null][Null][Null]BBC
9310640620027GATORADE BLUE BOLT 600ML[Null][Null][Null][Null][Null][Null]BBC
9310640620119GATORADE FIERCE GRAPE 600ML12242922815,023BBC
9310640000706GATORADE FIERCE STRAWBERRY 600ML[Null][Null][Null][Null][Null][Null]BBC
9310640000102GATORADE LEMON LIME 1L6242919712,779BBC
9310640000195GATORADE ORANGE 1L6242919712,779BBC
9310640620010GATORADE ORANGE ICE 600ML[Null][Null][Null][Null][Null][Null]BBC
         
ITEMITEM DESCRIPTIONPKSZIN_HGTIN_LENIN_WIDIN_WGTIN_VOLSOURCE
2484928Gatorade sports drink 1l Lemon & lime6242919713,224TER
2484939Gatorade sports drink 1l Orange ice6242919713,224TER
2484940Gatorade sports drink 1l Blue bolt6242919713,224TER
9 REPLIES 9
YLYONG
9 - Comet

Found the issue, and completed the workflow. However, this workflow is too 'heavy' when deployed into actual data sets.

 

Need your help on suggestions or a better solution.  Thank you

 

YLYONG_0-1633066453206.png

 

YLYONG
9 - Comet

Hi Alteryx Community, wanting to know if anyone within this community have any ideas or suggestions??

Thank you

DawnDuong
13 - Pulsar
13 - Pulsar

hi @YLYONG 

Just to make sure I understand correctly, is the objective to fill the "null" values in the upper table with the lower table being the reference standard?

Dawn.

YLYONG
9 - Comet

@DawnDuong 

thank you for your response. The objective is to fill the "null" values taking both tables as reference.

DawnDuong
13 - Pulsar
13 - Pulsar

hi @YLYONG 

Since it is fuzzy match optimisation, it's important to understand the "expected" threshold/outcome. So i have a couple of clarifications to ensure correct understanding:

  1. What is the "expected" output of the sample data?
  2. Can you give an idea of how "heavy" it is and the size of the data that you are processing?
  3. On first look, you have 2 fuzzy match tools and 3 join tools and it seems strange that you are using the J output of the first Join Tool to do further fuzzy match. What's the objective that you try to achieve here?

 

Dawn.

YLYONG
9 - Comet

@DawnDuong 

 

1. The expected output of the sample data is

 

GATORADE ARCTIC BLITZ 600ML, GATORADE BLUE BOLT 600ML, GATORADE FIERCE STRAWBERRY 600ML, GATORADE ORANGE ICE 600ML to have the same item attributes as GATORADE FIERCE GRAPE 600ML

 

GATORADE BLUE BOLT 1L to have the same item attributes as GATORADE ORANGE 1L

 

2. The actual data size has 21,500 unique items including 1,000 "NULL" items.

3. The first fuzzy match wasn't able to extrapolate all the "NULL". I have to do a second round of fuzzy-matching to fill in all the "NULL"😐  I was asking for help if there's a better way to do this?

DawnDuong
13 - Pulsar
13 - Pulsar

hi @YLYONG 

Thanks for clarifying. I understand it better now.

In my opinion, probably Fuzzy Match is not the best choice for this use case. Fuzzy Match works best for cases where you have similar pronunciations (e.g. Stuart vs. Steward), known abbreviations (e.g. Co. vs. company) and other know patterns (e.g. mobile phone). 

For "GATORADE ARCTIC BLITZ 600ML" to be matched to "GATORADE FIERCE GRAPE 600ML" would require lowering the threshold a lot... because "Artic Blitz" and "Fierce Grape" sound very different.

Just from the 2 examples given above, it seems the real "keys" in classifications are:

1) Brand (e.g. Gatorade)

2) Size (e.g. 600ml vs. 1L)

And that the attributes columns pertain to the physical characteristics and packaging of different SKU's. (I'm making a guess here on what your use case is, let me know if i'm completely off...)

It may be more efficient for you to use a modified approach by having a) the list of all possible brands that you carry and b) list of all possible sizes that you carry and then combine it with fuzzy matching using "Merge" mode (or any other non-fuzzy matching approach).

(i'm guessing here of course -- In most FMCG retail you should have a readily available database of brands and sizes that you carry).

Hope this helps somewhat.

Dawn.

YLYONG
9 - Comet

@DawnDuong  thank you for the excellent reply, really appreciate you taking the time to understand the problem, the "why" and "what" 🙂

 

What you mentioned was exactly the the concerns with using item description for Fuzzy Matching

For example 1L and 600ML could appear similar, but would actually have different dimensions. While the flavor (artic blitz vs fierce grape) would have larger amount of characters that would be different and not selecting as a match.

 

Thank you once again four your help and suggested options. 

DawnDuong
13 - Pulsar
13 - Pulsar

You are welcome. I actually enjoy co-thinking on real issues as it expands my experience beyond what I normally do and your question was a tough one, which I enjoyed thinking through.

Labels
Top Solution Authors