This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I think you can achieve this with a combination of a Fuzzy Match tool and a Formula tool. I have copied your table into the input data tool in the workflow attached.
The Fuzzy Match tool is setup in this example to look at Home Address (this can be amended to what you would prefer the fields to match on). It is also configured to 'Output unmatched records'. This means in the output of the tool you can see 6 records in two columns;
The first row is 2 record ID's which have been matched; RecordID = 3 and RecordID2 = 4 - This is the example you gave in your message
The following 5 records all have the same RecordID and RecordID2 as they have not fuzzy matched
We are missing 1 field due to this not having a Home Address (this is added back in later with a union
You then can join the data from the record IDs back in, so that you have your original data set, next to a copy of the data set from anything that has matched.
Using the Formula tool you can then add in IF statements to replace null values with anything that is populated from the matched Record ID (I have only added in Gender and Mobile Phone for this example)
I hope this helps. Let me know if you have any questions.
@wdavis, thank you for your reply. It shows me how little I know about Alteryx because that workflow is more complex than anything I've tried for this solution (and it's not even that complicated of a workflow).
Unfortunately I'm not sure it's working for my full data set. Running my data through this workflow brought it from 153,000 records to a little under 12,000. At first I thought maybe I do have that many duplicates. However, running the original data through the Unique tool and selecting First Name, Last Name, and Address as the unique fields I got 137,000 unique records matching that criteria.
Looking through the output, it appears that it actually deleted all 8 Joan Smith records.
The Unique tool is likely to be pulling through more unique rows that are in fact not unique and could have differences due to slight spelling changes, e.g Joan Smith and joan smith would both be considered Unique rows within the tool due to the case differences. Likewise, if any of the rows are missing data for one of the 3 fields, then it will identify these as unique values.
Which fields are you currently using in your fuzzy match tool?