Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Compare two datasets and flag rule violation

Highlighted
7 - Meteor

Hello everyone, 

Been stuck on this problem for a while now. I am not able to think through the logic and implementation using alteryx to achieve the desired results. 

 

There's two data sets, given below-

Data  set 1

 

Risk IDRule IDCode
FR0100QFBREV
FR0100QFBREV1
FR0100QFBREV2
FR0202DFBREV
FR0502DFBREV12

The combination of risk and rule ID in the first data set leads to the unique list of codes. 

 

Data set 2

Column1Code
User1FBREV
User1FBREV1
User2FBREV

Users are tagged with codes that they use.

 

When they use codes that are in the same Risk and Rule ID family, it is a violation. 

I just want to be able to flag those users who have violated this rule. 

 

In the above data sets, User1 has violated the rule as he used two codes (FBREV and FBREV1). Both of the codes fall under the (FR01+ 00Q) rule. Hence I need to be able to tag User1 as a rule violator. 

 

I have been breaking my head over this for a few days now.

Any help is highly appreciated.  

 

Thanks so much in advance. 

Rev

 

Highlighted
Alteryx Certified Partner

@rev

I would join the two datasets together on Code.  then from this joined dataset, I would run a summarize on Column1, RiskID and RuleID and then count the records.  Group by on Column1, RiskID and RuleID, count on Column1.  Run a filter to select any records where count is >1.  Those are Users which have a violation.  from this list of violation records, you can join back to dataset#2 and flag the records that join with a violation.

Hope this helps.  Let me know if you need any clarification or further help.

Maureen

Highlighted
7 - Meteor

Hey Maureen, thanks so much for your response. 

I tried what you had suggested below

 

Data set 1 has 162,755 records

Data set 2 has 10.8M records

 

and the join on Code ended up giving 717M records. 

 

Since we just joined on Code, and there are many codes with different Risk ID and Rule ID, we end up with a many to many situation which might not really be needed. 

 

Thank you,

Rev

 

Highlighted
Alteryx Certified Partner

The best way to eliminate unneeded joins is to run a summarize (or unique) on both datasets before the join.  Dataset1 - group by riskid, ruleid and code.  Dataset2 - group by on Col1 and code.  The resulting joins are expected.

Hope that makes sense unless I am misunderstanding the rules around the data.

Let me know if you still have a question.
Maureen

Highlighted
7 - Meteor

Hey Maureen, thanks so much!! that worked. it was able to identify everything correctly. thanks!!!!!!

Labels