Split paired records to the same file

Hey world!

I am working on a massive data de-duplication project, working with a lot of legacy data on multiple contact records. I've written this process, which pairs records by email address, marks the "master" record as the one more recently edited, and overwrites any last name shorter than two characters with the other last name in the pair. (IE Cheshire C. becomes Cheshire Cat).

I'd like to split this into two outputs. Output 1 should be any record where neither record nor its pair have had their name overwritten. Output 2 should be any record or its pair have had a name change.

How would I determine this, and how would I put both halves of a pair into the same output record?

Current:

Email	Level	First	Last
alice@wonderland.com	Master	Alice	Carroll
alice@wonderland.com	Victim	Alice	Carroll
ccat@wonderland.com	Victim	Cheshire	Cat
ccat@wonderland.com	Master	Cheshire	C

Goal:

Email	Level	First	Last	SaveTo
alice@wonderland.com	Master	Alice	Carroll	File1
alice@wonderland.com	Victim	Alice	Carroll	File1
ccat@wonderland.com	Victim	Cheshire	Cat	File2
ccat@wonderland.com	Master	Cheshire	C	File2

Data Investigation

Accepted answers

CharlieS

Sure thing.

- The most important tool here is the Tile tool. I've set it to find unique values of [Email], [First], and [Last] and set it to group by the [Email] field. For each email address, this will determine the number of unique combinations of first and last names for that email. In the version that I send you, selecting [Email] as a unique field is redundant since it is set as a group by field.

- The Summarize tool will group by [Email] and returns the number of unique First/Last name combos found for that email.

- The number of unique name records for each email is then rejoined and filtered so emails that all have the same name go to File1 and emails with different name combos go to File2.

20190517-SaveRecords.yxmd

All comments

CharlieS

Hello @agriese

In the attached example, I used a Tile tool to determine email records that had more than one First/Last Name combo. Once I joined that information back to all records of each email, I filtered them into the two output files. Let me know if you have any questions.

20190517-SaveRecords.yxmd

agriese

Hey @CharlieS, thanks for that example! I believe it works, but I'm not understanding the thought process that led you here. Would you mind explaining how it works, so I can troubleshoot on my own data?

CharlieS

Sure thing.

- The Summarize tool will group by [Email] and returns the number of unique First/Last name combos found for that email.

- The number of unique name records for each email is then rejoined and filtered so emails that all have the same name go to File1 and emails with different name combos go to File2.

20190517-SaveRecords.yxmd

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15867

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703