Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Make a guess based on the most common patterns

Masond3
8 - Asteroid

Hi All, 

 

I have a challenge, that seems easy but has me stumped. 

Aim 

 

I want to try and guess a customer email address based on the most common patterns of email address for customers at that company.,

 

After analyzing email addresses, I've found out that the most common formats are:

 

Possible email structures 
FirstJohn@nike.com
First [1 letter] + LastJSmith@nike.com
First+.+LastJohn.Smith@nike.com
LastSmith@nike.com
FirstLastJohnSmith@nike.com
last + first [1 letter]SmithJ@nike.com
first [1 letter]+.+LastJ.Smith@nike.com
First+Last[1 letter]JohnS@nike.com
Last+FirstSmithJohn@nike.com
Last.FirstSmith.John@nike.com


Assumptions 

Since majority of companies have standard formatted for their email address, i am hoping to find the most common email structure for a given company and then extrapolate based on it.

 

Example 

 

Company : Nike.com

  • 500 Contacts
  • 100 Contacts have no Email
  • 400 Contacts have the following email structures 

 

Number of ContactsPercent Email Structure  
24060%First+.+LastJohn.Smith@nike.com
10025%First [1 letter] + LastJSmith@nike.com
6015%FirstJohn@nike.com

 

Given that 60% of the contacts at Nike have the email structure of "First+.+Last" i would like to then follow a similar format for those 100 Contacts which have no email addresses

 

Current Input

 

ContactidFirst NameLastNameEmailCompany NameCompanyid
111111RichardPiperRichard.Piper@Nike.comNike Inc001f100001InnV5AAJ
222222DanielleCollinsDanielle.Collins@Nike.comNike Inc001f100001InnV5AAJ
333333DaneSmithDane.Smith@Nike.comNike Inc001f100001InnV5AAJ
44444RobertAtleryxRAlteryx@Nike.comNike Inc001f100001InnV5AAJ
55555JohnKing Nike Inc001f100001InnV5AAJ
666666ChrisDannher Nike Inc001f100001InnV5AAJ

 

Expected Outcome

 

ContactidFirst NameLastNameEmailPredicted EmailRationalCompany NameCompanyid
111111RichardPiperRichard.Piper@Nike.com  Nike Inc001f100001InnV5AAJ
222222DanielleCollinsDanielle.Collins@Nike.com  Nike Inc001f100001InnV5AAJ
333333DaneSmithDane.Smith@Nike.com  Nike Inc001f100001InnV5AAJ
44444RobertAtleryxRAlteryx@Nike.com  Nike Inc001f100001InnV5AAJ
55555JohnKing John.King@Nike.comCommon Email Pattern " First+.+Last"Nike Inc001f100001InnV5AAJ
666666ChrisDannaher Chris.Dannaher@Nike.comCommon Email Pattern " First+.+Last"Nike Inc001f100001InnV5AAJ

 

Looking forward to your help & advice

 

Many thanks 

 

Masond3

 

20 REPLIES 20
Masond3
8 - Asteroid

I been validating today and documented some use cases. Which i need to vet properly tomorrow to understand the logic and where I think it’s going wrong. 

At the moment i should be getting more data in the output than the input . (Due to the d split in email domains) however I am getting less than the inout( by a significant amount) 

 

so u just need to do some validating . I think the core of it’s there. It’s just tweaking , changing . Amending etc 

 

Polls
We’re dying to get your help in determining what the new profile picture frame should be this Halloween. Cast your vote and help us haunt the Community with the best spooky character.
Don’t ghost us—pick your favorite now!
Labels