Hey world!
I am working on a massive data de-duplication project, working with a lot of legacy data on multiple contact records. I've written this process, which pairs records by email address, marks the "master" record as the one more recently edited, and overwrites any last name shorter than two characters with the other last name in the pair. (IE Cheshire C. becomes Cheshire Cat).
I'd like to split this into two outputs. Output 1 should be any record where neither record nor its pair have had their name overwritten. Output 2 should be any record or its pair have had a name change.
How would I determine this, and how would I put both halves of a pair into the same output record?
Current:
| Email | Level | First | Last |
| alice@wonderland.com | Master | Alice | Carroll |
| alice@wonderland.com | Victim | Alice | Carroll |
| ccat@wonderland.com | Victim | Cheshire | Cat |
| ccat@wonderland.com | Master | Cheshire | C |
Goal:
| Email | Level | First | Last | SaveTo |
| alice@wonderland.com | Master | Alice | Carroll | File1 |
| alice@wonderland.com | Victim | Alice | Carroll | File1 |
| ccat@wonderland.com | Victim | Cheshire | Cat | File2 |
| ccat@wonderland.com | Master | Cheshire | C | File2 |