Alteryx Designer Desktop Discussions

trvu8447 · ‎11-26-2021

Hello! I was wondering how I would be able to remove certain rows of data that are essentially duplicates, but cannot technically be removed using the Unique tool due to slight differences in the presented data. How would I go about removing the 1st row with an incomplete address? And maybe even the 3rd row with the abbreviated version of the address? I have attached an image for reference. Thank you in advance!Essentially, all of these records are duplicates, but the "address" field name differs for all of these.

danilang · ‎11-27-2021

Hi @trvu8447

Dealing with humans is Hard! Specifically, human entered text. That's where the Fuzzy Match tool comes in. It analyzes the text in the inputs and attempts to score the items according to how likely it is that they represent the same underlying entity. Its tricky to use with many options, so read up on it in the tool mastery article, as well as googling the various algorithms available in the tool

The basic strategy is this.

1. Do a straight join on the field. This will remove all the exact duplicates and make it easier and faster to run the Fuzzy matching

2. Depending on the type of field that your processing, do some preprocessing to clean it up. For example, if you're analyzing an address field, you can use a lookup table coupled with a Find Replace to standardize the address parts. Street and St. both map to Street. Route, Rte and Rt all map to Route.

3 Feed the unmatched remainders into the Fuzzy Match tool. Make sure to specify the type of matching that you're doing, Name, Address, etc.

Dan

trvu8447 · ‎11-27-2021

Solved, thank you!

Alteryx Designer Desktop Discussions

How to remove individual rows of data under very specific conditions