Alteryx Designer Desktop Discussions

agriese · ‎05-17-2019

Hey world!

I am working on a massive data de-duplication project, working with a lot of legacy data on multiple contact records. I've written this process, which pairs records by email address, marks the "master" record as the one more recently edited, and overwrites any last name shorter than two characters with the other last name in the pair. (IE Cheshire C. becomes Cheshire Cat).

I'd like to split this into two outputs. Output 1 should be any record where neither record nor its pair have had their name overwritten. Output 2 should be any record or its pair have had a name change.

How would I determine this, and how would I put both halves of a pair into the same output record?

Current:

Email	Level	First	Last
alice@wonderland.com	Master	Alice	Carroll
alice@wonderland.com	Victim	Alice	Carroll
ccat@wonderland.com	Victim	Cheshire	Cat
ccat@wonderland.com	Master	Cheshire	C

Goal:

Email	Level	First	Last	SaveTo
alice@wonderland.com	Master	Alice	Carroll	File1
alice@wonderland.com	Victim	Alice	Carroll	File1
ccat@wonderland.com	Victim	Cheshire	Cat	File2
ccat@wonderland.com	Master	Cheshire	C	File2

CharlieS · ‎05-17-2019

Hello @agriese

In the attached example, I used a Tile tool to determine email records that had more than one First/Last Name combo. Once I joined that information back to all records of each email, I filtered them into the two output files. Let me know if you have any questions.

agriese · ‎05-21-2019

Hey @CharlieS, thanks for that example! I believe it works, but I'm not understanding the thought process that led you here. Would you mind explaining how it works, so I can troubleshoot on my own data?

CharlieS · ‎05-21-2019

Sure thing.

- The most important tool here is the Tile tool. I've set it to find unique values of [Email], [First], and [Last] and set it to group by the [Email] field. For each email address, this will determine the number of unique combinations of first and last names for that email. In the version that I send you, selecting [Email] as a unique field is redundant since it is set as a group by field.

- The Summarize tool will group by [Email] and returns the number of unique First/Last name combos found for that email.

- The number of unique name records for each email is then rejoined and filtered so emails that all have the same name go to File1 and emails with different name combos go to File2.

agriese · ‎05-22-2019

Thanks! I'll have to mess around with this Tile tool more going forward.

Alteryx Designer Desktop Discussions

Split paired records to the same file

Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Re: Single App that filters user selections dynami...

Re: Sorting Sheet Names