Hello, I'm asking for help again. My task is to validate if data in one field is properly filled according to data in other field. Both fields contain several values, )(usually comma separated, but that's not always the case)
So for example (scenario 1) I have a list of company names in one column, and list of HQ cities in the other:
| ID | Name | City |
| 1 | Apple, McDonald | Apple: Cupertino McDonald: Chicago |
| 2 | Adidas, Lego, Microsoft | Adidas: Herzogenauras, Lego: Billund, Microsoft: Chicago |
| 3 | Apple, Lego, Bayer | Apple: Cupertino Lego: Billund |
I have reference list as well:
| Name | City |
| Apple | Cupertino |
| Microsoft | Redmond |
| Lego | Billund |
| McDonald | Chicago |
| Bayer | Leverkusen |
| Adidas | Herzogenaurach |
I want to compare the city data with reference list to find if the data matches(so. i.e. if there is McDonald listed in the "Name column", there should be "Chicago" in the City. In best case scenario, the "Chicago" should follow "McDonald:" string, but I can manage it if It will only check if the value exists in the column. As a result I should get list of Record ID's with errors and information what Company/City combination is missing (In the best case, if there is wrong value, it should be listed as well:
So the output table should look like this:
Minimum variant:
| ID | Name without match |
| 2 | Microsoft |
| 3 | Bayer |
Best case:
| ID | Name without match | City |
| 2 | Microsoft | Chicago |
| 3 | Bayer | |
I did similar job but for data in one column - the solution is to split data into rows, compare with reference list, and then list those which are not matching. But I'm a bit stuck with how to deal with two columns.
Second scenario is similar, but with several reference lists, based on the value in one column. So for example if in the column A the value is "1", I should expect X, Y or Z (or any combination of these) in column B; if column A has value of "2", then in "B" should be something from L, M, or O, etc.
This can be done with cascade filters (i.e. filter out all values containg "1" in column A, then check B with reference list for 1, then from remaing data filter out all records with "2" in column A, and check.... etc.). But if we have wide range of values possible in column A, it will be difficult to manage such workflow. Maybe there is a better way to handle that?
Sample list of values is here:
| ID | Color | Code |
| 1 | Orange | OR1, OR2, OR3 |
| 2 | Blue | BL1, BL2, BL3 |
| 3 | Red | R1, R3, R6 |
| 4 | Green | G7, G8, G6 |
Reference lists:
| Color | Codes |
| Orange | OR1 |
| Orange | OR2 |
| Orange | OR3 |
| Orange | OR4 |
| Orange | OR5 |
| Color | Codes |
| Blue | BL1 |
| Blue | BL2 |
| Blue | BL3 |
| Blue | BL4 |
| Blue | BL5 |
| Color | Codes |
| Red | R1 |
| Red | R2 |
| Red | R4 |
| Red | R5 |
| Red | R7 |
| Color | Codes |
| Green | G5 |
| Green | G6 |
| Green | G8 |
| Green | G9 |
| Green | G10 |
And the desired output:
| ID | Color | Wrong code |
| 3 | Red | R3 |
| 3 | Red | R6 |
| 4 | Green | G7 |
It doesn't matter if wrong codes will be listed in separated rows or combined in one, I just need to know what are the wrong values