Hi All,
I have a challenge, that seems easy but has me stumped.
Aim
I want to try and guess a customer email address based on the most common patterns of email address for customers at that company.,
After analyzing email addresses, I've found out that the most common formats are:
Possible email structures | |
First | John@nike.com |
First [1 letter] + Last | JSmith@nike.com |
First+.+Last | John.Smith@nike.com |
Last | Smith@nike.com |
FirstLast | JohnSmith@nike.com |
last + first [1 letter] | SmithJ@nike.com |
first [1 letter]+.+Last | J.Smith@nike.com |
First+Last[1 letter] | JohnS@nike.com |
Last+First | SmithJohn@nike.com |
Last.First | Smith.John@nike.com |
Assumptions
Since majority of companies have standard formatted for their email address, i am hoping to find the most common email structure for a given company and then extrapolate based on it.
Example
Company : Nike.com
Number of Contacts | Percent | Email Structure | |
240 | 60% | First+.+Last | John.Smith@nike.com |
100 | 25% | First [1 letter] + Last | JSmith@nike.com |
60 | 15% | First | John@nike.com |
Given that 60% of the contacts at Nike have the email structure of "First+.+Last" i would like to then follow a similar format for those 100 Contacts which have no email addresses
Current Input
Contactid | First Name | LastName | Company Name | Companyid | |
111111 | Richard | Piper | Richard.Piper@Nike.com | Nike Inc | 001f100001InnV5AAJ |
222222 | Danielle | Collins | Danielle.Collins@Nike.com | Nike Inc | 001f100001InnV5AAJ |
333333 | Dane | Smith | Dane.Smith@Nike.com | Nike Inc | 001f100001InnV5AAJ |
44444 | Robert | Atleryx | RAlteryx@Nike.com | Nike Inc | 001f100001InnV5AAJ |
55555 | John | King | Nike Inc | 001f100001InnV5AAJ | |
666666 | Chris | Dannher | Nike Inc | 001f100001InnV5AAJ |
Expected Outcome
Contactid | First Name | LastName | Predicted Email | Rational | Company Name | Companyid | |
111111 | Richard | Piper | Richard.Piper@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
222222 | Danielle | Collins | Danielle.Collins@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
333333 | Dane | Smith | Dane.Smith@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
44444 | Robert | Atleryx | RAlteryx@Nike.com | Nike Inc | 001f100001InnV5AAJ | ||
55555 | John | King | John.King@Nike.com | Common Email Pattern " First+.+Last" | Nike Inc | 001f100001InnV5AAJ | |
666666 | Chris | Dannaher | Chris.Dannaher@Nike.com | Common Email Pattern " First+.+Last" | Nike Inc | 001f100001InnV5AAJ |
Looking forward to your help & advice
Many thanks
Masond3
@caltang so for the key decision makers at the firm we are pretty good and have the right communications.
However your point is still valid ! The holy grail would be a multiple third party vendors to cross reference , however we as a firm are a long way off that. So was hoping to use a data driven approach using existing data, do a small poc and then measure the ROI.
what is pics ? I assume (personal information communication services ?)
Person-in-charge - if the decision maker part is settled, then that's great!
Good call on using Alteryx. Your post actually gave me some ideas! Can also be applied for other purposes / cold outreach.
Thanks for asking this question!
@caltang - ooh I am glad you able to use some of my concept. Can you elaborate on some of your ideas ? A problem shared is a problem halved ! ;)
Keeping old records is tough. Some suppliers have changed emails, and the contact points have gone cold. It’s just names passing through and no proper email filing is in place. This is one application I see working well!
@Masond3 See if this solves the problem (assuming I understand it correctly). I've added another company with email formats A, B, and C. With 40% using A and B and 20% using C. In this case, it will format your email guesses in formats A and B and add a column that flags it as an even split.
@geoff_zath Cool. I will take a look when back in the office.
I have amended the flow slightly as when I am running my data approx 800k it was stalling around the “append” function.
I tried to get around this using the “ using joins” and that seemed to work better, but. let me see if I can upload my flow tomorrow so you can take a browse
@Masond3 Oh yes that would cause an issue! Not sure why I didn't use the Join tool in the first place 😅 See if this works better without the Append tool.
@Masond3 This version also takes into account different email domains per company ID. It will associate the most common ID with the predicted email structure. At least it "should"...
@geoff_zath lol. The only way I found that scenario was creating some test data, where data wasn’t perfect and tried to understand how it would be handled.
I think the common domain logic by company id needs to be before the logic where we flags it as an even split.
il be home in a couple of hours I can try and validate if not it be earliest tomorrow.
It might already be...check some of the wireless connections, it is not totally linear in how things are calculated.