Hi there
I have the following information in my data:
Name
Email
Mobile
I want to assign a unique ID to each person in my dataset but some records might have the same email/mobile but either do not have the same mobile number/email in the other record, but because they have either the same email or mobile number they are considered the same person and would like to assign the same Unique ID to them.
Can someone suggest some steps I can achive this?
Thanks in advance
Gelöst! Gehe zu Lösung.
You can try using the Tile tool and set it to unique values, and then choose your fields of interest to uniquely count like a RecordID for each specific person based on Tile Numbers.
If you can provide some samples, I can show you an example.
Thanks for your reply @caltang
Would the following be sufficient?
Unique ID | Name | Mobile | |
James Smith | jamessmith@webmail.com | 1234 | |
Daisy White | hellodaisy@hotmail.com | 68290 | |
James Rogan Smith | jamessmith@gmail.com | 1234 | |
James Smith | jamessmith@gmail.com | 123456 |
So essentially Im looking to generate a unique ID per unique individual. So since row 1, 3 and 4 are the same person he should receive 1 unique ID even although there might be some inconsistency in the name, email and mobile.
Now working off your info, I gather that the email field is the most reliable comparator as it takes first and last names.
Though I would say that my method only works if the assumption above is true. Alternatively, you'll need to use Fuzzy Match tool under "Join" tab to get what you want - though not a perfect outcome as well since it is dependent on your settings.
Thanks @flying008
Can you attach the workflow here for reference? thanks!
here my test. it using macro, just in case the chain is more than 2. i.e. email > mobile > email > mobile.
workflow:
macro:
copy lowest mobile_id to rest if email is same
copy lowest mobile_id if mobile is same
copy lowest email_id to rest if mobile is same
copy lowest email_id if email is same
repeat, till nothing is change.
then use tile for both final id to get the unique_id
Note: sort is required in my testing, to ensure lowest id (or highest) is copy instead of random id.