I have a dataset where I have to match columns
prd | code | cyy | prd_1 | code_1 | cyy_1 |
123 | All | USD | 123 | ABC | USD |
123 | All | All | 123 | ABC | USD |
456 | AKU | INR | 456 | AKU | INR |
456 | All | All | 456 | AKU | INR |
456 | AKU | INR | 456 | AKU | INR |
789 | All | PND | 789 | GRR | PND |
789 | GRR | PND | 789 | GRR | PND |
900 | LNO | RUS | 900 | LNO | RUS |
Conditions:
1. We have to find the row which has the most number of matches for every prod
2. As you can see prd and prd_1 will always be the same: We have to match these columns (code to code_1 and ccy to ccy1)
3. Now if you will see 1st line, code contains All and code_1 contains ABC both did not match, but ccy contains USD and ccy_1 is also contains USD, So we have the most number of the match in the 1st line for that prd. if you will see 2nd line which has the same prd 123 but ccy does not match with ccy_1 or code matches with code to code_1
prd | code | cyy | prd_1 | code_1 | cyy_1 |
123 | All | USD | 123 | ABC | USD |
456 | AKU | INR | 456 | AKU | INR |
456 | AKU | INR | 456 | AKU | INR |
789 | GRR | PND | 789 | GRR | PND |
900 | LNO | RUS | 900 | LNO | RUS |
Here's an approach that works with the data you provided, however if in practice you have many more columns to compare there's probably a way to make this more dynamic.