Hello, I'm asking for help again. My task is to validate if data in one field is properly filled according to data in other field. Both fields contain several values, )(usually comma separated, but that's not always the case)
So for example (scenario 1) I have a list of company names in one column, and list of HQ cities in the other:
ID | Name | City |
1 | Apple, McDonald | Apple: Cupertino McDonald: Chicago |
2 | Adidas, Lego, Microsoft | Adidas: Herzogenauras, Lego: Billund, Microsoft: Chicago |
3 | Apple, Lego, Bayer | Apple: Cupertino Lego: Billund |
I have reference list as well:
Name | City |
Apple | Cupertino |
Microsoft | Redmond |
Lego | Billund |
McDonald | Chicago |
Bayer | Leverkusen |
Adidas | Herzogenaurach |
I want to compare the city data with reference list to find if the data matches(so. i.e. if there is McDonald listed in the "Name column", there should be "Chicago" in the City. In best case scenario, the "Chicago" should follow "McDonald:" string, but I can manage it if It will only check if the value exists in the column. As a result I should get list of Record ID's with errors and information what Company/City combination is missing (In the best case, if there is wrong value, it should be listed as well:
So the output table should look like this:
Minimum variant:
ID | Name without match |
2 | Microsoft |
3 | Bayer |
Best case:
ID | Name without match | City |
2 | Microsoft | Chicago |
3 | Bayer |
I did similar job but for data in one column - the solution is to split data into rows, compare with reference list, and then list those which are not matching. But I'm a bit stuck with how to deal with two columns.
Second scenario is similar, but with several reference lists, based on the value in one column. So for example if in the column A the value is "1", I should expect X, Y or Z (or any combination of these) in column B; if column A has value of "2", then in "B" should be something from L, M, or O, etc.
This can be done with cascade filters (i.e. filter out all values containg "1" in column A, then check B with reference list for 1, then from remaing data filter out all records with "2" in column A, and check.... etc.). But if we have wide range of values possible in column A, it will be difficult to manage such workflow. Maybe there is a better way to handle that?
Sample list of values is here:
ID | Color | Code |
1 | Orange | OR1, OR2, OR3 |
2 | Blue | BL1, BL2, BL3 |
3 | Red | R1, R3, R6 |
4 | Green | G7, G8, G6 |
Reference lists:
Color | Codes |
Orange | OR1 |
Orange | OR2 |
Orange | OR3 |
Orange | OR4 |
Orange | OR5 |
Color | Codes |
Blue | BL1 |
Blue | BL2 |
Blue | BL3 |
Blue | BL4 |
Blue | BL5 |
Color | Codes |
Red | R1 |
Red | R2 |
Red | R4 |
Red | R5 |
Red | R7 |
Color | Codes |
Green | G5 |
Green | G6 |
Green | G8 |
Green | G9 |
Green | G10 |
And the desired output:
ID | Color | Wrong code |
3 | Red | R3 |
3 | Red | R6 |
4 | Green | G7 |
It doesn't matter if wrong codes will be listed in separated rows or combined in one, I just need to know what are the wrong values
Solved! Go to Solution.
Let me reply to myself, as it seems that after rethinking, the "minimum" variant for scenario 1 was pretty easy. But maybe there is better approach?
I'm struggling with "best case" variant for Scenario 1 (listing the wrong value alone). I can list the correct value for the city where I couldn't find a match, and list of all cities, but I have no clue how to leave only the wrong value in the city field (or leave it empty if there is no data for specific company).
And I still have no concept on how to deal with scenario 2 (multiple reference lists depending on values in column A).
One remark: these lists are separate, so if we have i.e. "Orange" in Color, then any code for other colors is wrong.
you just need to use multiple criteria for join tool. left/right joiner will be your answer. (depend on the input.)
so the issue is how to split to right data table.
i split the name first than replace with a "+" in front (or any unusual character) than use it split to lines than split to columns.
I don't know which way I prefer better... @Christina_H The solution with multi row formula for cities is really smart :)
Thanks to you both!