I have a data set where there are duplicates for some of the IDs. I want to select the one with all the information is available like phone number.
Input | ||
ID | Name | Phone Number |
1 | Robert | |
1 | Robert | 1234567890 |
5 | Nancy | 9876545670 |
5 | Niel | 97543246790 |
7 | Riti | 9871236540 |
7 | Richerd | |
Output | ||
ID | Name | Phone Number |
1 | Robert | 1234567890 |
5 | Nancy | 9876545670 |
5 | Niel | 97543246790 |
7 | Riti | 9871236540 |
7 | Richerd |
Thanks for your kind help!
Best
Pooja
Hi @Analytics_Pooja
I have created a solution for this specific request, please refer to the workflow attached.
Please let me know if anything is unclear :)
I worked on a workflow options that cleans the data first then you get your result sorted. Hope this help give you a different perspective to apply.
Option 1 - basic summary max to eliminate duplicates and sorts by ID and puts blank phone number fields towards bottom of ID stack:
Option 2 - filter out rows one column at a time to eventually only give you rows that contain data in ALL fields.
-Note - you could take the 'True' output of each filter and union that all together to have the list of entries that contain missing data in one or more fields.