Alteryx Designer

Find answers, ask questions, and share expertise about Alteryx Designer.
Andy Uttley, Alteryx ACE, makes music with Alteryx | Math + Music

Formulas

Highlighted
8 - Asteroid

Hi Family,

 

I have 2 columns side-by-side. They each include a name but in differing name formats as shown in the attached excel. Can anyone here help me with the login to add a third column that would tell me whether or not the names agree even though they're in differing formats?

 

Thank in advance!

 

Regards,

 

AuditPro13

Highlighted
Alteryx
Alteryx

@AuditPro13 

 

Which of these would you consider matches? Just the last one?

Highlighted
8 - Asteroid

Hi!

 

All would be considered matches.

Alteryx
Alteryx

In which case how do you know that tom.harris.green@abc.com is the same person as Tom Green?

Highlighted
8 - Asteroid

If the first and last name match, I would be comfortable they agree.

Highlighted
Alteryx
Alteryx

@AuditPro13 

 

This is not the most elegant solution but I tried to break the process into logical chunks, hopefully easy-to-follow chunks

 

-  Remove all the bits related to email address from both columns - I use the Multi-field formula using Regex_Replace function

REGEX_Replace([_CurrentField_], "(\@.+)", "")

 

- Remove all remaining punctuation and replace these with spaces - I use the Multi-field formula using Regex_Replace function

REGEX_Replace([_CurrentField_], "[[:punct:]]", " ")

 

- Unify the case - Using the Data Cleansing tool I turn both columns into Title Case

 

- Then we need to identify how many names each person has and only take the first and the last for both columns. I split the data and use a combination of the Text-to-columns tool to split the data into one row per name and then count how many there are using the Multi-row formula

 

- I then use the Summarize tool to get the Min and Max (First and Last names) and join the data back. 

 

- I then bring my columns side by side again and compare them using a simple conditional

 

Example attached.

 

A more elegant solution would probably be using Fuzzy match after some initial cleansing. You can find out more about this approach here.

 

column-match.png

Highlighted
14 - Magnetar
14 - Magnetar

Hi @AuditPro13 

 

Fuzzy match might work best here. Please see the attached workflow and let me know if this helps!! 

The challenge with Fuzzy, is you need to have what you're matching in the same column, hence the split (with select) and union. I'm concerned with the threshhold being set so low (if you click Edit in the Fuzzy Match tool, you'll see it's set to a 70% match threshhold.

Let me know if this helps! I'll continue to noodle on this

 

Cheers,

Esther

Labels