Hi,
I have hit a roadblock while trying to build a logic. Hope can get some some solution to my problem.
I am working with a complex data which contains the relation between a parent customer and a child customer and vice versa. The linkage file has got all possible combinations of the customers. The ask is to group or flag all the related customers separately. Below is a mock up of the sample data and the scenario is explained.
The aim to group all the related ones under the "Synthetic Group" column.
| Parent Customer | Child Customer | Synthetic Group |
| 100 | 101 | a1 |
| 100 | 102 | a1 |
| 100 | 103 | a1 |
| 100 | 104 | a1 |
| 126 | 100 | a1 |
| 127 | 100 | a1 |
| 128 | 100 | a1 |
| 101 | 178 | a1 |
| 101 | 179 | a1 |
| 101 | 180 | a1 |
| 101 | 180 | a1 |
| 105 | 678 | a2 |
| 105 | 567 | a2 |
| 567 | 221 | a2 |
| 567 | 222 | a2 |
| 679 | 302 | a2 |
| 567 | 105 | a2 |
As we can see in the picture, on sorting the Parent Customer column, whenever a new parent customer is identified, we are looking for all its relations in the child customer column and grouping them. Again the child customers that we are getting are being looked in the parent customer and child customer columns and if any relation is found we are assigning them in the same group.
Like parent customer 100 is directly related to child customers 101,102,103,104. Now we again for these child customers in the parent customer and child customer column and find for any new connections and if found we group them. This process would go on till there are no further relations of a group.
So the ide a is club together all the parent and child customers who might be linked any which way possible and create this synthetic group.
Needless to say this is just a sample. There are around 1 million rows in the main dataset