community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Logic Question: Evaluating Bad Last Names

Meteor

Hey world,

 

I've been cleaning through duplicated contacts in Salesforce. I'm at the point where everyone who had direct name/email matches has been merged, so now it's folks with first name/email matches. Any suggestions on how to begin validating these names?

 

Example data:

Valid Name?EmailFirstLast
 alice@wonderland.comAliceLast Name
 alice@wonderland.comAliceCarroll
 ccat@wonderland.comCheshireCat
 ccat@wonderland.comCheshireC

 

Goal data:

Valid Name?EmailFirstLast
Noalice@wonderland.comAliceLast Name
Yesalice@wonderland.comAliceCarroll
Yesccat@wonderland.comCheshireCat
Noccat@wonderland.comCheshireC
Nebula
Nebula

Hi @agriese 

 

Your first step is to come up with a list of rules as to what constitutes a  "Bad" last name, depending on your locale.  Your 1st example may not be universally applicable, since you do get multi-part last names, such as "De Gracia".

 

Once you have your criteria, coding it it is fairly straight forward.

 

Dan

Meteor

Hi @danilang, looks like my question wasn't clear enough. I was wondering if someone in the community had a basic list of rules they start with, so I can try those on my data set and see what fits. I'll reach out if I have problems writing the actual code!

Nebula
Nebula

Hi @agriese 

 

Not an easy problem, by any reckoning.  Check out this post Falsehoods Programmers Believe About Names that lists 40 different ways that programmers assumptions about names are incorrect.

 

Dan

Meteor

Just in case someone is watching this for advice, I thought I'd document what I chose to do.

 

The specific subset has some quirks I'm taking advantage of that won't work for most. I know everyone on my list is American and uses Latin characters to write both of their names. Since the dataset I'm working with is all duplicates with mis-matched user input, I chose to preserve whichever Last Name field has more information. If the last name in a pair is less than two characters (ie, they inputted a last initial) or contains the phrase "last name," I'm overwriting the field with the more-complete name from the other duplicate record.

 

Sketch workflow attached, but I'm still refining it.

Labels