Hi Alteryx Community,
I attempted using Fuzzy Match to extrapolate the nulls using information from the two tables however, it doesn't seem to produce the output that I was intending.
Can't work out what's the missing piece.
I have chosen "Words: Best of Jaor & Levenshtein Distance" for matching Item Description only, and 80% match threshold.
Thank you
ITEM | ITEM DESCRIPTION | PKSZ | IN_HGT | IN_LEN | IN_WID | IN_WGT | IN_VOL | SOURCE |
9310640000973 | GATORADE ARCTIC BLITZ 600ML | [Null] | [Null] | [Null] | [Null] | [Null] | [Null] | BBC |
9310640000201 | GATORADE BLUE BOLT 1L | [Null] | [Null] | [Null] | [Null] | [Null] | [Null] | BBC |
9310640620027 | GATORADE BLUE BOLT 600ML | [Null] | [Null] | [Null] | [Null] | [Null] | [Null] | BBC |
9310640620119 | GATORADE FIERCE GRAPE 600ML | 12 | 24 | 29 | 22 | 8 | 15,023 | BBC |
9310640000706 | GATORADE FIERCE STRAWBERRY 600ML | [Null] | [Null] | [Null] | [Null] | [Null] | [Null] | BBC |
9310640000102 | GATORADE LEMON LIME 1L | 6 | 24 | 29 | 19 | 7 | 12,779 | BBC |
9310640000195 | GATORADE ORANGE 1L | 6 | 24 | 29 | 19 | 7 | 12,779 | BBC |
9310640620010 | GATORADE ORANGE ICE 600ML | [Null] | [Null] | [Null] | [Null] | [Null] | [Null] | BBC |
ITEM | ITEM DESCRIPTION | PKSZ | IN_HGT | IN_LEN | IN_WID | IN_WGT | IN_VOL | SOURCE |
2484928 | Gatorade sports drink 1l Lemon & lime | 6 | 24 | 29 | 19 | 7 | 13,224 | TER |
2484939 | Gatorade sports drink 1l Orange ice | 6 | 24 | 29 | 19 | 7 | 13,224 | TER |
2484940 | Gatorade sports drink 1l Blue bolt | 6 | 24 | 29 | 19 | 7 | 13,224 | TER |
Solved! Go to Solution.
Hi Alteryx Community, wanting to know if anyone within this community have any ideas or suggestions??
Thank you
hi @YLYONG
Just to make sure I understand correctly, is the objective to fill the "null" values in the upper table with the lower table being the reference standard?
Dawn.
thank you for your response. The objective is to fill the "null" values taking both tables as reference.
hi @YLYONG
Since it is fuzzy match optimisation, it's important to understand the "expected" threshold/outcome. So i have a couple of clarifications to ensure correct understanding:
Dawn.
1. The expected output of the sample data is
GATORADE ARCTIC BLITZ 600ML, GATORADE BLUE BOLT 600ML, GATORADE FIERCE STRAWBERRY 600ML, GATORADE ORANGE ICE 600ML to have the same item attributes as GATORADE FIERCE GRAPE 600ML
GATORADE BLUE BOLT 1L to have the same item attributes as GATORADE ORANGE 1L
2. The actual data size has 21,500 unique items including 1,000 "NULL" items.
3. The first fuzzy match wasn't able to extrapolate all the "NULL". I have to do a second round of fuzzy-matching to fill in all the "NULL"😐 I was asking for help if there's a better way to do this?
hi @YLYONG
Thanks for clarifying. I understand it better now.
In my opinion, probably Fuzzy Match is not the best choice for this use case. Fuzzy Match works best for cases where you have similar pronunciations (e.g. Stuart vs. Steward), known abbreviations (e.g. Co. vs. company) and other know patterns (e.g. mobile phone).
For "GATORADE ARCTIC BLITZ 600ML" to be matched to "GATORADE FIERCE GRAPE 600ML" would require lowering the threshold a lot... because "Artic Blitz" and "Fierce Grape" sound very different.
Just from the 2 examples given above, it seems the real "keys" in classifications are:
1) Brand (e.g. Gatorade)
2) Size (e.g. 600ml vs. 1L)
And that the attributes columns pertain to the physical characteristics and packaging of different SKU's. (I'm making a guess here on what your use case is, let me know if i'm completely off...)
It may be more efficient for you to use a modified approach by having a) the list of all possible brands that you carry and b) list of all possible sizes that you carry and then combine it with fuzzy matching using "Merge" mode (or any other non-fuzzy matching approach).
(i'm guessing here of course -- In most FMCG retail you should have a readily available database of brands and sizes that you carry).
Hope this helps somewhat.
Dawn.
@DawnDuong thank you for the excellent reply, really appreciate you taking the time to understand the problem, the "why" and "what" 🙂
What you mentioned was exactly the the concerns with using item description for Fuzzy Matching
For example 1L and 600ML could appear similar, but would actually have different dimensions. While the flavor (artic blitz vs fierce grape) would have larger amount of characters that would be different and not selecting as a match.
Thank you once again four your help and suggested options.
You are welcome. I actually enjoy co-thinking on real issues as it expands my experience beyond what I normally do and your question was a tough one, which I enjoyed thinking through.