Identifying relations

Question

Hi,

I have hit a roadblock while trying to build a logic. Hope can get some some solution to my problem.

I am working with a complex data which contains the relation between a parent customer and a child customer and vice versa. The linkage file has got all possible combinations of the customers. The ask is to group or flag all the related customers separately. Below is a mock up of the sample data and the scenario is explained.

The aim to group all the related ones under the "Synthetic Group" column.

Parent CustomerChild CustomerSynthetic Group 100101a1100102a1100103a1100104a1126100a1127100a1128100a1101178a1101179a1101180a1101180a1105678a2105567a2567221a2567222a2679302a2567105a2

As we can see in the picture, on sorting the Parent Customer column, whenever a new parent customer is identified, we are looking for all its relations in the child customer column and grouping them. Again the child customers that we are getting are being looked in the parent customer and child customer columns and if any relation is found we are assigning them in the same group.

Like parent customer 100 is directly related to child customers 101,102,103,104. Now we again for these child customers in the parent customer and child customer column and find for any new connections and if found we group them. This process would go on till there are no further relations of a group.

So the ide a is club together all the parent and child customers who might be linked any which way possible and create this synthetic group.

Needless to say this is just a sample. There are around 1 million rows in the main dataset

NicoleJohnson · Answer

You might check out the "Make Group" tool - point the 1st Key to the Parent column, and the 2nd to the Child column. This should sort your data into two columns, with "Group" containing 1 of the items that can be used as your overall group (or "synthetic group" in your case, and then the second column containing all the values from both Parent and Child that belong to that group. The format of the data is a little different, but I believe it would be fairly straightforward to use the Make Group result as the basis for a lookup table that you could then use to append the groups back to your original list, for example, to get something that looks like your original desired output.

Note - it actually looks like there are 3 groups - I think members 302 & 679 should be in their own group, rather than included in a2 as in your example (second to last row), since they don't appear in any other matching combos of parent/child?

Hope that helps!

NJ