Hello everyone,
Been stuck on this problem for a while now. I am not able to think through the logic and implementation using alteryx to achieve the desired results.
There's two data sets, given below-
Data set 1
Risk ID | Rule ID | Code |
FR01 | 00Q | FBREV |
FR01 | 00Q | FBREV1 |
FR01 | 00Q | FBREV2 |
FR02 | 02D | FBREV |
FR05 | 02D | FBREV12 |
The combination of risk and rule ID in the first data set leads to the unique list of codes.
Data set 2
Column1 | Code |
User1 | FBREV |
User1 | FBREV1 |
User2 | FBREV |
Users are tagged with codes that they use.
When they use codes that are in the same Risk and Rule ID family, it is a violation.
I just want to be able to flag those users who have violated this rule.
In the above data sets, User1 has violated the rule as he used two codes (FBREV and FBREV1). Both of the codes fall under the (FR01+ 00Q) rule. Hence I need to be able to tag User1 as a rule violator.
I have been breaking my head over this for a few days now.
Any help is highly appreciated.
Thanks so much in advance.
Rev
Solved! Go to Solution.
I would join the two datasets together on Code. then from this joined dataset, I would run a summarize on Column1, RiskID and RuleID and then count the records. Group by on Column1, RiskID and RuleID, count on Column1. Run a filter to select any records where count is >1. Those are Users which have a violation. from this list of violation records, you can join back to dataset#2 and flag the records that join with a violation.
Hope this helps. Let me know if you need any clarification or further help.
Maureen
Hey Maureen, thanks so much for your response.
I tried what you had suggested below
Data set 1 has 162,755 records
Data set 2 has 10.8M records
and the join on Code ended up giving 717M records.
Since we just joined on Code, and there are many codes with different Risk ID and Rule ID, we end up with a many to many situation which might not really be needed.
Thank you,
Rev
The best way to eliminate unneeded joins is to run a summarize (or unique) on both datasets before the join. Dataset1 - group by riskid, ruleid and code. Dataset2 - group by on Col1 and code. The resulting joins are expected.
Hope that makes sense unless I am misunderstanding the rules around the data.
Let me know if you still have a question.
Maureen
Hey Maureen, thanks so much!! that worked. it was able to identify everything correctly. thanks!!!!!!