This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
In an attempt to cleanup our Brand information using UPC info, I am interested in identifying duplicate brands that may be represented/spelled differently.
This is basically what I would be interested in producing:
Rather than use a similarity score on the Brand name itself, we are interested in grouping UPCs based on how similar they are (perhaps looking at the first 11 digits) and then evaluating the brands for that UPC group to determine if it should be reviewed or not to match the most common Brand found for that UPC group. For example, in the sample data I provided, we would want to correct Bee's Knees Corporation to the already existing brand Bee's Knees.
In the past I've been able to successfully group based on string similarity, but it's seems challenging to group actual numbers (in this case the first 11 digits of that number). Any thoughts on how to tackle this?
Thank you for your response. I forgot to mention that UPC Group would need to be derived somehow. How would I create the logic for grouping the UPCs? UPC Group would have to be created, as it is not an existing field