Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Matching with list based on a value in other column (2 scenarios)

kwieto
8 - Asteroid

Hello, I'm asking for help again. My task is to validate if data in one field is properly filled according to data in other field. Both fields contain several values, )(usually comma separated, but that's not always the case)

 

So for example (scenario 1) I have a list of company names in one column, and list of HQ cities in the other:

 

IDNameCity
1Apple, McDonaldApple: Cupertino McDonald: Chicago
2Adidas, Lego, MicrosoftAdidas: Herzogenauras, Lego: Billund, Microsoft: Chicago
3Apple, Lego, BayerApple: Cupertino Lego: Billund


I have reference list as well:

NameCity
AppleCupertino
MicrosoftRedmond
LegoBillund
McDonaldChicago
BayerLeverkusen
AdidasHerzogenaurach


I want to compare the city data with reference list to find if the data matches(so. i.e. if there is McDonald listed in the "Name column", there should be "Chicago" in the City. In best case scenario, the "Chicago" should follow "McDonald:" string, but I can manage it if It will only check if the value exists in the column. As a result I should get list of Record ID's with errors and information what Company/City combination is missing (In the best case, if there is wrong value, it should be listed as well:

So the output table should look like this:
Minimum variant:

IDName without match
2Microsoft
3Bayer

 

Best case:

IDName without matchCity
2MicrosoftChicago
3Bayer 

 

I did similar job but for data in one column - the solution is to split data into rows, compare with reference list, and then list those which are not matching. But I'm a bit stuck with how to deal with two columns.

Second scenario is similar, but with several reference lists, based on the value in one column. So for example if in the column A the value is "1", I should expect X, Y or Z (or any combination of these) in column B; if column A has value of "2", then in "B" should be something from L, M, or O, etc.
This can be done with cascade filters (i.e. filter out all values containg "1" in column A, then check B with reference list for 1, then from remaing data filter out all records with "2" in column A, and check.... etc.). But if we have wide range of values possible in column A, it will be difficult to manage such workflow. Maybe there is a better way to handle that?

Sample list of values is here:

IDColorCode
1OrangeOR1, OR2, OR3
2BlueBL1, BL2, BL3
3RedR1, R3, R6
4GreenG7, G8, G6

 

Reference lists:

 

ColorCodes
OrangeOR1
OrangeOR2
OrangeOR3
OrangeOR4
OrangeOR5

 

ColorCodes
BlueBL1
BlueBL2
BlueBL3
BlueBL4
BlueBL5

 

ColorCodes
RedR1
RedR2
RedR4
RedR5
RedR7

 

ColorCodes
GreenG5
GreenG6
GreenG8
GreenG9
GreenG10

 

And the desired output:

IDColorWrong code
3RedR3
3RedR6
4GreenG7

 

It doesn't matter if wrong codes will be listed in separated rows or combined in one, I just need to know what are the wrong values

5 REPLIES 5
kwieto
8 - Asteroid

Let me reply to myself, as it seems that after rethinking, the "minimum" variant for scenario 1 was pretty easy. But maybe there is better approach?

kwieto_0-1681725686549.png

I'm struggling with "best case" variant for Scenario 1 (listing the wrong value alone). I can list the correct value for the city where I couldn't find a match, and list of all cities, but I have no clue how to leave only the wrong value in the city field (or leave it empty if there is no data for specific company).

And I still have no concept on how to deal with scenario 2 (multiple reference lists depending on values in column A).
One remark: these lists are separate, so if we have i.e. "Orange" in Color, then any code for other colors is wrong.

PangHC
12 - Quasar

you just need to use multiple criteria for join tool. left/right joiner will be your answer. (depend on the input.)
Pang_Hee_Choy_0-1681728508415.png
so the issue is how to split to right data table. 


i split the name first than replace with a "+" in front (or any unusual character) than use it split to lines than split to columns.

Pang_Hee_Choy_1-1681728652982.png

 

Christina_H
14 - Magnetar

Second scenario is easy enough.  I'm still working on the first!

Christina_H_0-1681731447767.png

 

Christina_H
14 - Magnetar

And here's a solution for scenario 1

Christina_H_0-1681732541570.png

 

kwieto
8 - Asteroid

I don't know which way I prefer better... @Christina_H The solution with multi row formula for cities is really smart :)

Thanks to you both!

Labels
Top Solution Authors