Hi all,
I am stuck on 1 of the task.
I have 2 datasets.
Both have 1 common identifier called ID column.
There are multiple other columns in both the datasets.
Main column of interest is "Name" column in each which essentially consists of names of management people. 1 name in 1 row.
So, that means 1 ID can span into multiple rows 1/2/3/4/...(I have until 7 right now for some IDs) as it has 7 people in management.
The other data has also the similar data but from different source. Thus, it's not compulsory that the ID which has 2 rows and thus 2 names in the 1 dataset, will have 2 rows and those 2 names there as well.
There is a possibility that the ID doesn't exist at all there, so that will be 1 break type. That ID doesn't exist thus no names. (I reckon simple join can do this after taking unique to remove Cartesian join)
There is another possibility that the ID do exist on both sides, but the number of names are different. On 1 side, there are 2 names and rows for each ID, on other side, there are 3 or just 1. So, that is another break type, with missing names in 1 dataset.
There is last break type. With same number of names and IDs present both sides, the names need to be same as well, if different name/s, then that's a break.
Any help appreciated, thanks.
@Isha_Gupta I would suggest joining on ID first (if they are same in both files) and then looking at left and right joins and matching on name and different fields and repeating the process. A bit hard to give more details without seeing the data set. LEt me know if this helps