Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Make a guess based on the most common patterns

Masond3
8 - Asteroid

Hi All, 

 

I have a challenge, that seems easy but has me stumped. 

Aim 

 

I want to try and guess a customer email address based on the most common patterns of email address for customers at that company.,

 

After analyzing email addresses, I've found out that the most common formats are:

 

Possible email structures 
FirstJohn@nike.com
First [1 letter] + LastJSmith@nike.com
First+.+LastJohn.Smith@nike.com
LastSmith@nike.com
FirstLastJohnSmith@nike.com
last + first [1 letter]SmithJ@nike.com
first [1 letter]+.+LastJ.Smith@nike.com
First+Last[1 letter]JohnS@nike.com
Last+FirstSmithJohn@nike.com
Last.FirstSmith.John@nike.com


Assumptions 

Since majority of companies have standard formatted for their email address, i am hoping to find the most common email structure for a given company and then extrapolate based on it.

 

Example 

 

Company : Nike.com

  • 500 Contacts
  • 100 Contacts have no Email
  • 400 Contacts have the following email structures 

 

Number of ContactsPercent Email Structure  
24060%First+.+LastJohn.Smith@nike.com
10025%First [1 letter] + LastJSmith@nike.com
6015%FirstJohn@nike.com

 

Given that 60% of the contacts at Nike have the email structure of "First+.+Last" i would like to then follow a similar format for those 100 Contacts which have no email addresses

 

Current Input

 

ContactidFirst NameLastNameEmailCompany NameCompanyid
111111RichardPiperRichard.Piper@Nike.comNike Inc001f100001InnV5AAJ
222222DanielleCollinsDanielle.Collins@Nike.comNike Inc001f100001InnV5AAJ
333333DaneSmithDane.Smith@Nike.comNike Inc001f100001InnV5AAJ
44444RobertAtleryxRAlteryx@Nike.comNike Inc001f100001InnV5AAJ
55555JohnKing Nike Inc001f100001InnV5AAJ
666666ChrisDannher Nike Inc001f100001InnV5AAJ

 

Expected Outcome

 

ContactidFirst NameLastNameEmailPredicted EmailRationalCompany NameCompanyid
111111RichardPiperRichard.Piper@Nike.com  Nike Inc001f100001InnV5AAJ
222222DanielleCollinsDanielle.Collins@Nike.com  Nike Inc001f100001InnV5AAJ
333333DaneSmithDane.Smith@Nike.com  Nike Inc001f100001InnV5AAJ
44444RobertAtleryxRAlteryx@Nike.com  Nike Inc001f100001InnV5AAJ
55555JohnKing John.King@Nike.comCommon Email Pattern " First+.+Last"Nike Inc001f100001InnV5AAJ
666666ChrisDannaher Chris.Dannaher@Nike.comCommon Email Pattern " First+.+Last"Nike Inc001f100001InnV5AAJ

 

Looking forward to your help & advice

 

Many thanks 

 

Masond3

 

20 REPLIES 20
Masond3
8 - Asteroid

@caltang so for the key decision makers at the firm we are pretty good and have the right communications. 

However your point is still valid ! The holy grail would be a multiple third party vendors to cross reference , however we as a firm are a long way off that. So was hoping to use a data driven approach using existing data, do a small poc and then measure the ROI. 

 

what is pics ? I assume (personal information communication services ?)

caltang
17 - Castor
17 - Castor

Person-in-charge - if the decision maker part is settled, then that's great! 

 

Good call on using Alteryx. Your post actually gave me some ideas! Can also be applied for other purposes / cold outreach.

 

Thanks for asking this question! 

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Masond3
8 - Asteroid

@caltang - ooh I am glad you able to use some of my concept. Can you elaborate  on some of your ideas ? A problem shared is a problem halved ! ;) 

caltang
17 - Castor
17 - Castor

Keeping old records is tough. Some suppliers have changed emails, and the contact points have gone cold. It’s just names passing through and no proper email filing is in place. This is one application I see working well!

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
geoff_zath
Alteryx
Alteryx

@Masond3 See if this solves the problem (assuming I understand it correctly). I've added another company with email formats A, B, and C. With 40% using A and B and 20% using C. In this case, it will format your email guesses in formats A and B and add a column that flags it as an even split. 

Masond3
8 - Asteroid

@geoff_zath Cool. I will take a look when back in the office.

 

I have amended the flow slightly as when I am running  my data approx 800k it was stalling around the “append” function. 

I tried to get around this using the “ using joins” and that seemed to work better, but.  let me see if I can upload my flow tomorrow so you can take a browse 

geoff_zath
Alteryx
Alteryx

@Masond3 Oh yes that would cause an issue! Not sure why I didn't use the Join tool in the first place 😅 See if this works better without the Append tool.

geoff_zath
Alteryx
Alteryx

@Masond3 This version also takes into account different email domains per company ID. It will associate the most common ID with the predicted email structure. At least it "should"...

Masond3
8 - Asteroid

@geoff_zath lol. The only way I found that scenario was creating some test data, where data wasn’t perfect and tried to understand how it would be handled. 

I think the common domain logic by company id needs to be before the logic where we  flags it as an even split. 

il be home in a couple of hours I can try and validate if not it be earliest tomorrow. 

 

geoff_zath
Alteryx
Alteryx

It might already be...check some of the wireless connections, it is not totally linear in how things are calculated. 

Labels