Hi,
I have unstructured string data including tons of emails in a single cell. (around 5k row)
Inputs:
for eg, " Daniel. Beaudoin@x.Com'""Andrew.Sharp@x.ComBarbara.Zukor@x.ComBarbara.Zukor@x.Com / "
for eg, " Adam.Altberg@x.Com,Joe.Quinn@x.Com,Joe.Quinn@x.Com; Geoffrey.Cardy@x.Com,Joe.Quinn@x.com "
Note: There is no delimiter here. Separated either by , or ; or / or / / or ( )
Note: Almost every kind of domain is there(.uk,.org,.ca,.com etc.)
Outputs:
for eg, "daniel.beaudoin@x.com, andrew.sharp@x.com, barbara.zukor@x.com"
for eg, "adam.altberg@x.com, joe.quinn@x.com, geoffrey.cardy@x.com "
Summary:
need to clean each single cell and have it ready each unique emails in one cell seperated by comma(,)
Regards,
What system are you pulling these emails from originally? The only way you will be to successfully parse these out without a delimiter is if you know all of the email domains that are possible.
Scenarios like Andrew.Sharp@x.ComBarbara.Zukor@x.Com can't be parsed out unless you know that the .com is the end of the URL. Parsing and creating a unique list is native functionality in Alteryx, but I'm curious how you differentiate between Andrew.Sharp@x.com Barbara.Zukor@x.com and Andrew.Sharp@x.co mBarbarbara.Zukor@x.com
since the pain point is to separate email. I will only focus on this.
1. remove all delimiter, add if you have more, exclude "."
ReplaceChar([Field1], " ,/():;","")
2. use regex to take each email with tokenize function (by this format, name@domain.xxx)
.+?@.+?\..{3}
.+? > name, any text before "@"
@.+? > domain, any text after "@" before "."
\..{3} > .xxx, 3 characters after "."
then the rest is simple.
The problem is that there are no single domains, here are some examples of my domains.
for eg: @npcd.com, @maples.com, @odlumbrown.com, @churchofengland.org, @riyadcapital.com, @raymondjames.ca, @cdam.co.uk
Another example:
""derekaguthrie@gmail.com; sarah@sarahmcquaid.com /// Julian Stair - has applied for POA over Derek -->Julian@Julianstair.com // Craig Ward lawyer: craig.ward@craybecklaw.co.uk; // Sophie OConnell: soconnell@wolferstans.com""
I am pulling data as csv from db, I am using that csv as an input for Alteryx workflow.
then no choice to build a table for the ending (.com, .co.uk, .ca and etc.)
to more accurate, delimiter a used separate email as well.
however, it will split if the name have space in between.
this workflow required run and verified new email address ending and repeat until all data is analyzed.
User | Count |
---|---|
18 | |
14 | |
10 | |
6 | |
6 |