General Discussions

Discuss any topics that are not product-specific here.
SOLVED

Not Completely Random Sampling

agimpel
6 - Meteoroid

Hello! I have a sampling question that isn't totally random sampling. i have a large population of items where each item has 3 separate tests within each item. each item for test (in my example attached the different fruits), is broken out to be tested by someone different. in my example attached i have included only one example for each person but in my true population there will be thousands of records. my sampling plan is to take one of the 3 tests for each item (test 1 2 or 3) and the caveat is that i want to for each person test 30% of each test (test 1, 2 and 3). is there a workflow that i could use to create this sampling plan? i have started by breaking each person into their own subset of my workflow but then need to get the even 30% mix between the different tests. any suggestions? TIA! 

3 REPLIES 3
CoG
13 - Pulsar

Not sure exactly what it is that you are trying to do, but if you are trying to ensure that exactly 30% of people are assigned to take test 1 for a given item, 30% test 2, and 30% test 3, leaving exactly 10% unassigned, then you can do the following:

Variant 1 (Truly Random):

Use RandomInt(9) to index your data (grouped by item),

people take Test 1 for items where 1<= [index] <= 3

Test 2 where 4<=[index]<=6

Test 3 where 7<=[index]<=9,

leaving index = 0 (final 10% untouched)

 

With a large enough dataset you will approach the 30-30-30-10 split that you desire.

 

Variant 2 (Fixed Random):

Alternatively, to guarantee that exact split every time and for any size dataset, you use function UuidCreate(), to generate a "random" index for every item group, sort on UUID column, and then assign first 30% to Test 1, etc.

 

I may have misunderstood your exact sampling needs, but both methods above should be simply reconfigured to achieve other outcomes.

 

Happy Solving!

agimpel
6 - Meteoroid

Sorry if that was confusing! i guess ignore the piece about splitting it by person and really it should be a 30-30-30 split by test type so im thinking thats your second solution? and which tool would i use

CoG
13 - Pulsar

I don't think explaining the logic again will be of much help, so here is a sample workflow that you can apply to your use case:

Screenshot.png

I will note that the randomization comes from the middle branch with function UuidCreate(), followed by sorting. I assumed that Test & Owner were a unique combination.

Labels