Hi All,
I have a challenge, that seems easy but has me stumped.
Aim
I want to try and guess a customer email address based on the most common patterns of email address for customers at that company.,
After analyzing email addresses, I've found out that the most common formats are:
Possible email structures | |
First | John@nike.com |
First [1 letter] + Last | JSmith@nike.com |
First+.+Last | John.Smith@nike.com |
Last | Smith@nike.com |
FirstLast | JohnSmith@nike.com |
last + first [1 letter] | SmithJ@nike.com |
first [1 letter]+.+Last | J.Smith@nike.com |
First+Last[1 letter] | JohnS@nike.com |
Last+First | SmithJohn@nike.com |
Last.First | Smith.John@nike.com |
Assumptions
Since majority of companies have standard formatted for their email address, i am hoping to find the most common email structure for a given company and then extrapolate based on it.
Example
Company : Nike.com
Number of Contacts | Percent | Email Structure | |
240 | 60% | First+.+Last | John.Smith@nike.com |
100 | 25% | First [1 letter] + Last | JSmith@nike.com |
60 | 15% | First | John@nike.com |
Given that 60% of the contacts at Nike have the email structure of "First+.+Last" i would like to then follow a similar format for those 100 Contacts which have no email addresses
Current Input
Contactid | First Name | LastName | Company Name | Companyid | |
111111 | Richard | Piper | Richard.Piper@Nike.com | Nike Inc | 001f100001InnV5AAJ |
222222 | Danielle | Collins | Danielle.Collins@Nike.com | Nike Inc | 001f100001InnV5AAJ |
333333 | Dane | Smith | Dane.Smith@Nike.com | Nike Inc | 001f100001InnV5AAJ |
44444 | Robert | Atleryx | RAlteryx@Nike.com | Nike Inc | 001f100001InnV5AAJ |
55555 | John | King | Nike Inc | 001f100001InnV5AAJ | |
666666 | Chris | Dannher | Nike Inc | 001f100001InnV5AAJ |
Expected Outcome
Contactid | First Name | LastName | Predicted Email | Rational | Company Name | Companyid | |
111111 | Richard | Piper | Richard.Piper@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
222222 | Danielle | Collins | Danielle.Collins@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
333333 | Dane | Smith | Dane.Smith@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
44444 | Robert | Atleryx | RAlteryx@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
55555 | John | King | John.King@Nike.com | Common Email Pattern " First+.+Last" | Nike Inc | 001f100001InnV5AAJ | |
666666 | Chris | Dannaher | Chris.Dannaher@Nike.com | Common Email Pattern " First+.+Last" | Nike Inc | 001f100001InnV5AAJ |
Looking forward to your help & advice
Many thanks
Masond3
@Masond3 ,
First, why invent the wheel?
https://rocketreach.co/nike-email-format_b5c61e3ff42e0c48
Then you can create a dynamic formula based upon the free info.
Cheers,
Mark
@MarqueeCrew
So I have looked at Rocket reach before however to onboard that vendor would take time, and the cost also needs to be considered.
But you’re right, conceptually I am trying to do the same thing! The key thing is speed to market, so trying to find a crude solution in alteryx
Interesting problem. This isn't the most elegant solution, but you can brute force your way through it by matching the email patterns with a Formula tool. Hope that gets you in the right direction!
@geoff_zath - Hi - So your solution is very interesting, after playing around, i have found a flaw in my thinking which in turn has affected the solution that you have provided.
What i have noticed is that the solution is taking the highest percentage email pattern and applying across all the contacts regardless of companyid
When actually, we should find the email pattern for a given companyid, and apply that email pattern for those contacts where the email is blank
Current Input
Contactid | First Name | LastName | Company Name | Companyid | |
111111 | Richard | Piper | Richard.Piper@Nike.com | Nike Inc | 001f100001InnV5AAJ |
222222 | Danielle | Collins | Danielle.Collins@Nike.com | Nike Inc | 001f100001InnV5AAJ |
333333 | Dane | Smith | Dane.Smith@Nike.com | Nike Inc | 001f100001InnV5AAJ |
44444 | Robert | Atleryx | RAlteryx@Nike.com | Nike Inc | 001f100001InnV5AAJ |
55555 | John | King | Nike Inc | 001f100001InnV5AAJ | |
666666 | Chris | Dannher | Nike Inc | 001f100001InnV5AAJ | |
AAAA | Richard | Piper | Richard@Alteryx.com | Alteryx | 11111 |
bbbb | Danielle | Collins | Daniel@Alteryx.com | Alteryx | 11111 |
cccc | Dane | Smith | Dane@Alteryx.com | Alteryx | 11111 |
dddd | john | Smith | Alteryx | 11111 |
Expected Outcome
Contactid | First Name | LastName | Predicted Email | Rational | Company Name | Companyid | |
111111 | Richard | Piper | Richard.Piper@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
222222 | Danielle | Collins | Danielle.Collins@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
333333 | Dane | Smith | Dane.Smith@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
44444 | Robert | Atleryx | RAlteryx@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
55555 | John | King | John.King@Nike.com | Common Email Pattern " First+.+Last" | Nike Inc | 001f100001InnV5AAJ | |
666666 | Chris | Dannaher | Chris.Dannaher@Nike.com | Common Email Pattern " First+.+Last" | Nike Inc | 001f100001InnV5AAJ | |
AAAA | Richard | Piper | Richard@Alteryx.com | Alteryx | 11111 | ||
BBBB | Danielle | Collins | Daniel@Alteryx.com | Alteryx | 11111 | ||
CCCC | Dane | Smith | Dane@Alteryx.com | Alteryx | 11111 | ||
DDDD | john | Smith | John@alteryx.com | Common Email Pattern " First" | Alteryx | 11111 |
@geoff_zath I will take a look at your solution tomorrow when back in the office.
Random question- how do you handle multiple contacts with different email patterns associated to same company that have different email formats
Example -
If you have three contacts, and two of those three contacts have email addresses associated to same company and both emails addresses have a different format, meaning that they both have a 50% association to them . For that one contact that has the missing email, what should the email pattern be assigned to it?
in those scenarios do you think it’s best to flag these exceptions
Just curious on the use case application of this request. @Masond3 could you share a bit more of what you intend to do with this workflow? Trying to understand it.
From the looks of it, this allows you to get the info without RocketReach, right?
@caltang so my use case is to solve two particular problems statements
1)we have a high level of bounces
2) we have a high % of contacts that have no email
the first scenario wouldn’t be covered in this flow, however the goal would be to find those contacts which have a different email pattern than the rest at a given company and try and use the recomended pattern to change the email and hopefully reduce the bounces
the second use case is we want to grow market share , so predicting an email address could be a great way to do that.
Interesting. Thanks for sharing!
Though you may hit a lot of blanks and/or wrong PICs with same names. Getting to the decision maker seems to be a better bet if you can find their LinkedIn (if any), no?