Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to remove individual rows of data under very specific conditions

trvu8447
6 - Meteoroid

Hello! I was wondering how I would be able to remove certain rows of data that are essentially duplicates, but cannot technically be removed using the Unique tool due to slight differences in the presented data. How would I go about removing the 1st row with an incomplete address? And maybe even the 3rd row with the abbreviated version of the address? I have attached an image for reference. Thank you in advance!Essentially, all of these records are duplicates, but the "address" field name differs for all of these.Essentially, all of these records are duplicates, but the "address" field name differs for all of these.

 

 

2 REPLIES 2
danilang
19 - Altair
19 - Altair

Hi @trvu8447 

 

Dealing with humans is Hard!   Specifically, human entered text.   That's where the Fuzzy Match tool comes in.  It analyzes the text in the inputs and attempts to score the items according to how likely it is that they represent the same underlying entity.  Its tricky to use with many options, so read up on it in the tool mastery article, as well as googling the various algorithms available in the tool

 

The basic strategy is this. 

1. Do a straight join on the field.  This will remove all the exact duplicates and make it easier and faster to run the Fuzzy matching

2. Depending on the type of field that your processing, do some preprocessing to clean it up.  For example, if you're analyzing an address field, you can use a lookup table coupled with a Find Replace to standardize the address parts.  Street and St. both map to Street.  Route, Rte and Rt all map to Route. 

3 Feed the unmatched remainders into the Fuzzy Match tool.  Make sure to specify the type of matching that you're doing, Name, Address, etc.

 

Dan   

 

 

 

 

trvu8447
6 - Meteoroid

Solved, thank you!

Labels