Hi - Thank you for looking into this.
I am trying to sample a dataset using the Random % tool by 10%, which is straight forward.
But i need the random tool to skip records flagged as 'inactive', and only output the sampled records which are flagged as 'active'.
I don't want to filter out the 'inactive' records before the random tool because then i would not get the desired 10% of the whole population.
For example, if dataset has 1000 records - 600 active and 400 inactive. I need result to be 100 active sampled records. Not 10% of 600 active.
Can any one help me with this?
Thanks
Solved! Go to Solution.
Hi @win_flags
Since this a custom scenario normal random sample cant be used. Here is how you can achieve your requirement.
Upper
1. Using RandInd() function in formula tool i am generating random id for each row (this will change on each run)
2. Sorting it by RandID hence order is randomized
3. Filtering to keep only active
4. Adding order (record id) using record id tool. Order will be later used for filtering
Lower
1. Using summarize to get total count
2. Using formula tool to set random %(10%) and calculate sample count(1000*0.1=100)
3. Using append tool to join sample count with the data (upper branch)
4.using filter tool to keep only first n% (10%) which will give first 100 records which were randomly ordered.
You can test it to check whether it satisfies the condition
Hope this helps 🙂
@atcodedog05 - Thank you. This makes sense.
I just have one more condition to this.
So, I have another column in source data, lets call it 'Group' with values as A, B, C across 1000 records. I need 10% active employees from within each group.
Example -
Group Type Records
A Active 200
A Inactive 100
10% active records from group A - 30
B Active 300
B Inactive 100
10% active records from group B - 40
C Active 100
C inactive 200
10% active records from group C - 30
I want to group them then do the random sampling in each group, resulting in only active records maintaining the total numbers of records.
Thanks in advance!
Hi @win_flags
You can do something like this.
1. In summarize tool you can take groupby on group
2. Instead of order (record id tool) i am using multi-row formula tool to create group order (using groupby on group)
3. Instead of append tool i am using join tool to get each groups sample count.
4. Filter where order <= sample count.
Edit : filter formula for wrong fixed it.
Hope this helps 🙂
I was able to implement the Random % Sample into a batch macro.
But the Npercent of Ramdom % Sample tool will only take integer so there are cases that not exactly 10% generated.
@atcodedog05 Thank you - This works.
Really appreciate your help.
@qiu i will give this a try too. As i was keen to use random % tool at first. And i created a batch macro for it but wasn't able to skip records in the tool.
The other solution without the random % tools works for now too.
But if i perfect the batch macro using your approach i will post that too here.
Could you please share me the configuration of random sample active-r1.yxmc, it doesn't download.