Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Skip records on Random % tool Sampling

win_flags
5 - Atom

Hi - Thank you for looking into this. 

 

I am trying to sample a dataset using the Random % tool by 10%, which is straight forward. 

 

But i need the random tool to skip records flagged as 'inactive', and only output the sampled records which are flagged as 'active'. 

 

I don't want to filter out the 'inactive' records before the random tool because then i would not get the desired 10% of the whole population. 

 

For example,  if dataset has 1000 records - 600 active and 400 inactive. I need result to be 100 active sampled records. Not 10% of 600 active. 

 

Can any one help me with this?

 

Thanks

 

win_flags_0-1621919965722.png

 

7 REPLIES 7
atcodedog05
22 - Nova
22 - Nova

Hi @win_flags 

 

Since this a custom scenario normal random sample cant be used. Here is how you can achieve your requirement.

 

atcodedog05_0-1621921537469.png

 

Upper

1. Using RandInd() function in formula tool i am generating random id for each row (this will change on each run)

2. Sorting it by RandID hence order is randomized

3. Filtering to keep only active

4. Adding order (record id) using record id tool. Order will be later used for filtering

 

Lower

1. Using summarize to get total count

2. Using formula tool to set random %(10%) and calculate sample count(1000*0.1=100)

 

3. Using append tool to join sample count with the data (upper branch)

4.using filter tool to keep only first n% (10%) which will give first 100 records which were randomly ordered.

 

You can test it to check whether it satisfies the condition

 

Hope this helps 🙂

win_flags
5 - Atom

@atcodedog05 - Thank you. This makes sense. 

 

I just have one more condition to this.

 

So, I have another column in source data, lets call it 'Group' with values as A, B, C across 1000 records. I need 10% active employees from within each group. 

 

Example - 

 

Group    Type       Records

A           Active      200

A           Inactive   100 

 

10% active records from group A - 30 

 

B          Active       300

B          Inactive    100

 

10% active records from group B - 40 

 

C          Active       100

C          inactive     200

 

10% active records from group C - 30 

 

I want to group them then do the random sampling in each group, resulting in only active records maintaining the total numbers of records. 

 

Thanks in advance!

atcodedog05
22 - Nova
22 - Nova

Hi @win_flags 

 

You can do something like this.

atcodedog05_0-1621925391049.png

 

1. In summarize tool you can take groupby on group

2. Instead of order (record id tool) i am using multi-row formula tool to create group order (using groupby on group)

3. Instead of append tool i am using join tool to get each groups sample count.

4. Filter where order <= sample count.

 

Edit : filter formula for wrong fixed it.

 

Hope this helps 🙂

Qiu
20 - Arcturus
20 - Arcturus

@win_flags 

I was able to implement the Random % Sample into a batch macro.

But the Npercent of Ramdom % Sample tool will only  take integer so there are cases that not exactly 10% generated.

Capture3A.PNGCapture3B.PNG

win_flags
5 - Atom

@atcodedog05 Thank you - This works. 

 

Really appreciate your help. 

 

 

win_flags
5 - Atom

@qiu i will give this a try too. As i was keen to use random % tool at first. And i created a batch macro for it but wasn't able to skip records in the tool. 

 

The other solution without the random % tools works for now too. 

 

But if i perfect the batch macro using your approach i will post that too here. 

KWIZATZ
6 - Meteoroid

Could you please share me the configuration of random sample active-r1.yxmc, it doesn't download.

Labels