community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Community v19.6

Looks aren't everything... But the latest Community refresh looks darn good!

Learn More

Cleaning Email address domains

Hello to anyone that can help...

 

I am trying to clean emails in the database that have incorrect domains eg gmaill, hotmaill, ggmail etc

 

obviously there is a lot of different variations so im not sure the best way to do this?

 

After that i would like to correct them. Gmaill to Gmail etc

 

Any help is appreciated!

 

Thanks,

Kerry

Bolide

Hi @KCAgentProvocateur,

 

This sounds like an interesting problem! I would suggest a few things to try:

1) If you haven't already, I would parse your email filed with @ as the delimiter so that you can work with just the email ending

2) It wouldn't hurt to do a summary grouping by the new email ending field, just to see how many you're working with. If this is a one-time clean-up, it might not be too much to go through manually.

3) If this is a process you want to implement in the future or if manual fixing is simply not an option, then I suggest you check out the Fuzzy Match tool, in the Join tool set. The Fuzzy Match tool can compare each record to all other records in the same field and put them into like groups. This, taken together with the Make Groups tool, might give you a good match set of emails.

 

I made a simple example in the attached. It does a pretty good job of grouping the emails, you just might have to adjust the group name at the end.

 

Best,

mmenth

Alteryx Partner

Regex solutions are really misleading for proper e-mail validation.

barrackobama@gmail.com looks legit but may not be a real address...

 

you can't understand if the domain is real by just checking regex...

 

Here is how to do proper e-mail varification

 

1) syntax check - do a regexmatch for the pattern 

'^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$'

 

2) get the domain MX 

import dns.resolver

records = dns.resolver.query('emailhippo.com', 'MX')
mxRecord = records[0].exchange
mxRecord = str(mxRecord)

 

3) test if server accepts messages to that specific mail account...

 

import socket
import smtplib

# Get local server hostname
host = socket.gethostname()

# SMTP lib setup (use debug level for full output)
server = smtplib.SMTP()
server.set_debuglevel(0)

# SMTP Conversation
server.connect(mxRecord)
server.helo(host)
server.mail('me@domain.com')
code, message = server.rcpt(str(addressToVerify))
server.quit()

# Assume 250 as Success
if code == 250:
	print('Success')
else:
	print('Bad')

 

that shows you if the e-mail address is well and alive...

 

Hey mmeth,

 

Thank you for this!

 

I'm a newbie so would you mind helping me with step 1?

 

Good point on step 2 ill do this and see what im dealing with and if its not possible then will look into step 3

 

Thanks,

Kerry

Hi Atabarezz, 

 

This looks a little complex for a newbie like me - do you have a workflow already made on how to do this?

 

Thanks

Bolide

Hey @KCAgentProvocateur 

 

No problem, attached I included how to use the text to columns tool and also the summarize that I mentioned.

 

Best,

mmenth

Great thanks - so now i have 135 emails as gamil.com - what would be the best way to change those email addresses to have gmail.com?

Bolide

You could use the Find and Replace tool like in the attached.

 

Best,

mmenth

Bolide

FYI, whenever using a new tool you can click on it up in the toolbar at the top and click 'Open Example' to see how it's used.

 

Capture.PNG

great thank you for your help!

Labels