This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Since it is fuzzy match optimisation, it's important to understand the "expected" threshold/outcome. So i have a couple of clarifications to ensure correct understanding:
What is the "expected" output of the sample data?
Can you give an idea of how "heavy" it is and the size of the data that you are processing?
On first look, you have 2 fuzzy match tools and 3 join tools and it seems strange that you are using the J output of the first Join Tool to do further fuzzy match. What's the objective that you try to achieve here?
Thanks for clarifying. I understand it better now.
In my opinion, probably Fuzzy Match is not the best choice for this use case. Fuzzy Match works best for cases where you have similar pronunciations (e.g. Stuart vs. Steward), known abbreviations (e.g. Co. vs. company) and other know patterns (e.g. mobile phone).
For "GATORADE ARCTIC BLITZ 600ML" to be matched to "GATORADE FIERCE GRAPE 600ML" would require lowering the threshold a lot... because "Artic Blitz" and "Fierce Grape" sound very different.
Just from the 2 examples given above, it seems the real "keys" in classifications are:
1) Brand (e.g. Gatorade)
2) Size (e.g. 600ml vs. 1L)
And that the attributes columns pertain to the physical characteristics and packaging of different SKU's. (I'm making a guess here on what your use case is, let me know if i'm completely off...)
It may be more efficient for you to use a modified approach by having a) the list of all possible brands that you carry and b) list of all possible sizes that you carry and then combine it with fuzzy matching using "Merge" mode (or any other non-fuzzy matching approach).
(i'm guessing here of course -- In most FMCG retail you should have a readily available database of brands and sizes that you carry).