Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extract each emails in single cell and make it unique and concat with comma

mertmetinn
5 - Atom

Hi,

 

I have unstructured string data including tons of emails in a single cell. (around 5k row)

 

Inputs:

for eg, " Daniel. Beaudoin@x.Com'""Andrew.Sharp@x.ComBarbara.Zukor@x.ComBarbara.Zukor@x.Com / "

for eg, " Adam.Altberg@x.Com,Joe.Quinn@x.Com,Joe.Quinn@x.Com; Geoffrey.Cardy@x.Com,Joe.Quinn@x.com    "

 

Note: There is no delimiter here. Separated either by , or ; or  / or / / or ( )

Note: Almost every kind of domain is there(.uk,.org,.ca,.com etc.)

 

Outputs:

for eg, "daniel.beaudoin@x.com,  andrew.sharp@x.com,  barbara.zukor@x.com" 

for eg, "adam.altberg@x.com, joe.quinn@x.com, geoffrey.cardy@x.com   "

 

Summary:

 

need to clean each single cell and have it ready each unique emails in one cell seperated by comma(,)

 

Regards,

 

6 REPLIES 6
BrandonB
Alteryx
Alteryx

What system are you pulling these emails from originally?  The only way you will be to successfully parse these out without a delimiter is if you know all of the email domains that are possible. 

 

Scenarios like Andrew.Sharp@x.ComBarbara.Zukor@x.Com can't be parsed out unless you know that the .com is the end of the URL. Parsing and creating a unique list is native functionality in Alteryx, but I'm curious how you differentiate between Andrew.Sharp@x.com Barbara.Zukor@x.com and Andrew.Sharp@x.co mBarbarbara.Zukor@x.com

Pang_Hee_Choy
12 - Quasar
 

since the pain point is to separate email. I will only focus on this.

 

1. remove all delimiter, add if you have more, exclude "."

 

 

ReplaceChar([Field1], " ,/():;","")

 

 

2. use regex to take each email with tokenize function (by this format, name@domain.xxx)

 

 

.+?@.+?\..{3}

 

 

.+?    > name, any text before "@"

@.+? > domain, any text after "@" before "."

\..{3}   > .xxx, 3 characters after "."

 

then the rest is simple. 

 

 

Screenshot 2023-11-14 100153.png

mertmetinn
5 - Atom

The problem is that there are no single domains, here are some examples of my domains.

 

for eg: @npcd.com, @maples.com, @odlumbrown.com, @churchofengland.org, @riyadcapital.com, @raymondjames.ca, @cdam.co.uk

mertmetinn
5 - Atom

Another example:

""derekaguthrie@gmail.com; sarah@sarahmcquaid.com /// Julian Stair - has applied for POA over Derek -->Julian@Julianstair.com // Craig Ward lawyer: craig.ward@craybecklaw.co.uk; // Sophie OConnell: soconnell@wolferstans.com"" 

mertmetinn
5 - Atom

I am pulling data as csv from db, I am using that csv as an input for Alteryx workflow.

Pang_Hee_Choy
12 - Quasar

then no choice to build a table for the ending (.com, .co.uk, .ca and etc.)

Screenshot 2023-11-15 132508.png

to more accurate, delimiter a used separate email as well.

however, it will split if the name have space in between.

 

Screenshot 2023-11-15 132455.png

 

this workflow required run and verified new email address ending and repeat until all data is analyzed.

 

Screenshot 2023-11-15 132608.png

Labels