Alteryx Designer Desktop Discussions

rev · ‎01-13-2019

Hello everyone,

Been stuck on this problem for a while now. I am not able to think through the logic and implementation using alteryx to achieve the desired results.

There's two data sets, given below-

Data set 1

Risk ID	Rule ID	Code
FR01	00Q	FBREV
FR01	00Q	FBREV1
FR01	00Q	FBREV2
FR02	02D	FBREV
FR05	02D	FBREV12

The combination of risk and rule ID in the first data set leads to the unique list of codes.

Data set 2

Column1	Code
User1	FBREV
User1	FBREV1
User2	FBREV

Users are tagged with codes that they use.

When they use codes that are in the same Risk and Rule ID family, it is a violation.

I just want to be able to flag those users who have violated this rule.

In the above data sets, User1 has violated the rule as he used two codes (FBREV and FBREV1). Both of the codes fall under the (FR01+ 00Q) rule. Hence I need to be able to tag User1 as a rule violator.

I have been breaking my head over this for a few days now.

Any help is highly appreciated.

Thanks so much in advance.

Rev

Dynamomo · ‎01-13-2019

@rev

I would join the two datasets together on Code. then from this joined dataset, I would run a summarize on Column1, RiskID and RuleID and then count the records. Group by on Column1, RiskID and RuleID, count on Column1. Run a filter to select any records where count is >1. Those are Users which have a violation. from this list of violation records, you can join back to dataset#2 and flag the records that join with a violation.

Hope this helps. Let me know if you need any clarification or further help.

Maureen

rev · ‎01-14-2019

Hey Maureen, thanks so much for your response.

I tried what you had suggested below

Data set 1 has 162,755 records

Data set 2 has 10.8M records

and the join on Code ended up giving 717M records.

Since we just joined on Code, and there are many codes with different Risk ID and Rule ID, we end up with a many to many situation which might not really be needed.

Thank you,

Rev

Dynamomo · ‎01-14-2019

The best way to eliminate unneeded joins is to run a summarize (or unique) on both datasets before the join. Dataset1 - group by riskid, ruleid and code. Dataset2 - group by on Col1 and code. The resulting joins are expected.

Hope that makes sense unless I am misunderstanding the rules around the data.

Let me know if you still have a question.
Maureen

rev · ‎01-15-2019

Hey Maureen, thanks so much!! that worked. it was able to identify everything correctly. thanks!!!!!!

Alteryx Designer Desktop Discussions

Compare two datasets and flag rule violation