Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

The highly anticipated Alteryx Community tradition is back! We hope you'll join us!

Learn More
We will be upgrading the Gallery to our 2019.4 release this Saturday, December 7th beginning at 9:00am MT. We expect the outage to take last approx. 2.5 hours.

How to group UPCs based on Similarity and evaluate Anomalies




In an attempt to cleanup our Brand information using UPC  info, I am interested in identifying duplicate brands that may be represented/spelled differently.  


This is basically what I would be interested in producing:



Rather than use a similarity score on the Brand name itself, we are interested in grouping UPCs based on how similar they are (perhaps looking at the first 11 digits) and then evaluating the brands for that UPC group to determine if it should be reviewed or not to match the most common Brand found for that UPC group. For example, in the sample data I provided, we would want to correct Bee's Knees Corporation to the already existing brand Bee's Knees.


In the past I've been able to successfully group based on string similarity, but it's seems challenging to group actual numbers (in this case the first 11 digits of that number). Any thoughts on how to tackle this?


Thank you for any guidance!


User friendly version:


UPCUPC GroupManufacturerBrandAction
00000000002000000000000200Bee's KneesBee's Knees 
00000000002010000000000200Bee's KneesBee's Knees 
00000000002020000000000200Bee's KneesBee's Knees 
00000000002030000000000200Bee's KneesBee's Knees CorporationREVIEW
00000000002100000000000200Bee's KneesBee's Knees 
00000000005400000000000500MMMMMM Inc.REVIEW
00000000005600000000000500MMMMulti Medium Media 
00000000012000000000001200Peppy ColaPeppy Cola 
00000000012100000000001200Peppy ColaPeppy Cola 
00000000012200000000001200Peppy ColaPeppy Cola 
00000000012300000000001200Peppy ColaPepsiREVIEW
00000777788800000077778880Killer DaveKiller Dave 
00000777788850000077778880Killer DaveKiller Dave 
Alteryx Certified Partner

Does something like this work for you?




Here, both MMM Inc. and Multi Medium Media are flagged for review because they are both duplicates (with fewer occurrences) within the group 0000...1200




Thank you for your response. I forgot to mention that UPC Group would need to be derived somehow. How would I create the logic for grouping the UPCs? UPC Group would have to be created, as it is not an existing field



Thank you.


A formula tool would allow you to do this with the formula below






This removes the left 11 characters from UPC, and then pads the string with zeroes until you get to 13 digits long. 


Hi @anthony123 ,

The  fuzzy match  on  brand provides same results .  

Hope this helps .