Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Make a guess based on the most common patterns

Masond3
8 - Asteroid

Hi All, 

 

I have a challenge, that seems easy but has me stumped. 

Aim 

 

I want to try and guess a customer email address based on the most common patterns of email address for customers at that company.,

 

After analyzing email addresses, I've found out that the most common formats are:

 

Possible email structures 
FirstJohn@nike.com
First [1 letter] + LastJSmith@nike.com
First+.+LastJohn.Smith@nike.com
LastSmith@nike.com
FirstLastJohnSmith@nike.com
last + first [1 letter]SmithJ@nike.com
first [1 letter]+.+LastJ.Smith@nike.com
First+Last[1 letter]JohnS@nike.com
Last+FirstSmithJohn@nike.com
Last.FirstSmith.John@nike.com


Assumptions 

Since majority of companies have standard formatted for their email address, i am hoping to find the most common email structure for a given company and then extrapolate based on it.

 

Example 

 

Company : Nike.com

  • 500 Contacts
  • 100 Contacts have no Email
  • 400 Contacts have the following email structures 

 

Number of ContactsPercent Email Structure  
24060%First+.+LastJohn.Smith@nike.com
10025%First [1 letter] + LastJSmith@nike.com
6015%FirstJohn@nike.com

 

Given that 60% of the contacts at Nike have the email structure of "First+.+Last" i would like to then follow a similar format for those 100 Contacts which have no email addresses

 

Current Input

 

ContactidFirst NameLastNameEmailCompany NameCompanyid
111111RichardPiperRichard.Piper@Nike.comNike Inc001f100001InnV5AAJ
222222DanielleCollinsDanielle.Collins@Nike.comNike Inc001f100001InnV5AAJ
333333DaneSmithDane.Smith@Nike.comNike Inc001f100001InnV5AAJ
44444RobertAtleryxRAlteryx@Nike.comNike Inc001f100001InnV5AAJ
55555JohnKing Nike Inc001f100001InnV5AAJ
666666ChrisDannher Nike Inc001f100001InnV5AAJ

 

Expected Outcome

 

ContactidFirst NameLastNameEmailPredicted EmailRationalCompany NameCompanyid
111111RichardPiperRichard.Piper@Nike.com  Nike Inc001f100001InnV5AAJ
222222DanielleCollinsDanielle.Collins@Nike.com  Nike Inc001f100001InnV5AAJ
333333DaneSmithDane.Smith@Nike.com  Nike Inc001f100001InnV5AAJ
44444RobertAtleryxRAlteryx@Nike.com  Nike Inc001f100001InnV5AAJ
55555JohnKing John.King@Nike.comCommon Email Pattern " First+.+Last"Nike Inc001f100001InnV5AAJ
666666ChrisDannaher Chris.Dannaher@Nike.comCommon Email Pattern " First+.+Last"Nike Inc001f100001InnV5AAJ

 

Looking forward to your help & advice

 

Many thanks 

 

Masond3

 

20 REPLIES 20
Masond3
8 - Asteroid

I been validating today and documented some use cases. Which i need to vet properly tomorrow to understand the logic and where I think it’s going wrong. 

At the moment i should be getting more data in the output than the input . (Due to the d split in email domains) however I am getting less than the inout( by a significant amount) 

 

so u just need to do some validating . I think the core of it’s there. It’s just tweaking , changing . Amending etc 

 

Labels
Top Solution Authors