Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

General Discussions

Discuss any topics that are not product-specific here.
SOLVED

Not Completely Random Sampling

agimpel
7 - Meteor

Hello! I have a sampling question that isn't totally random sampling. i have a large population of items where each item has 3 separate tests within each item. each item for test (in my example attached the different fruits), is broken out to be tested by someone different. in my example attached i have included only one example for each person but in my true population there will be thousands of records. my sampling plan is to take one of the 3 tests for each item (test 1 2 or 3) and the caveat is that i want to for each person test 30% of each test (test 1, 2 and 3). is there a workflow that i could use to create this sampling plan? i have started by breaking each person into their own subset of my workflow but then need to get the even 30% mix between the different tests. any suggestions? TIA! 

3 REPLIES 3
CoG
14 - Magnetar

Not sure exactly what it is that you are trying to do, but if you are trying to ensure that exactly 30% of people are assigned to take test 1 for a given item, 30% test 2, and 30% test 3, leaving exactly 10% unassigned, then you can do the following:

Variant 1 (Truly Random):

Use RandomInt(9) to index your data (grouped by item),

people take Test 1 for items where 1<= [index] <= 3

Test 2 where 4<=[index]<=6

Test 3 where 7<=[index]<=9,

leaving index = 0 (final 10% untouched)

 

With a large enough dataset you will approach the 30-30-30-10 split that you desire.

 

Variant 2 (Fixed Random):

Alternatively, to guarantee that exact split every time and for any size dataset, you use function UuidCreate(), to generate a "random" index for every item group, sort on UUID column, and then assign first 30% to Test 1, etc.

 

I may have misunderstood your exact sampling needs, but both methods above should be simply reconfigured to achieve other outcomes.

 

Happy Solving!

agimpel
7 - Meteor

Sorry if that was confusing! i guess ignore the piece about splitting it by person and really it should be a 30-30-30 split by test type so im thinking thats your second solution? and which tool would i use

CoG
14 - Magnetar

I don't think explaining the logic again will be of much help, so here is a sample workflow that you can apply to your use case:

Screenshot.png

I will note that the randomization comes from the middle branch with function UuidCreate(), followed by sorting. I assumed that Test & Owner were a unique combination.

Labels
Top Solution Authors