Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Sampling question

tww
8 - Asteroid

Hi all,

 

I am new to Alteryx.

I am trying to achieve the following:

1) To read in input data

2) To read in sample size = 13

3) To create sample table

4) To pick samples from each group based on the number listed in the sample table

 

Input data

 

Record IDGroup
122
222
323
423
523
623
724
827
927
1027
1128
1228
1328
1428
1532
1632
1732
1832
1932
2032
2132
2232
2332
2432
2532
2632
2732
2832
2932
3032
3132
3232
3337
3437
3537

 

Sample size: 13

 

To create a sample table

 

1) To read the sample size = 13 (manual input)

2) To count the number of records per each group

3) To calculate the % of no of records per group over the whole population (higher the percentage, more sample would be selected from that group)

4) To calculate the number of samples to be picked for each group. Each group must have at least one picked, no matter what the percentage is.

5) To pull addition samples from the ones with the highest %, to get to the required total sample size (in this case 13).

 

 GroupCount% populationCalculated sample sizeAdjusted sample size
 2220.05714285711
 2340.11428571411
 2730.08571428611
 2410.02857142911
 2840.11428571411
 32180.51428571467
 3730.08571428611
Total 35 1213

 

Sample output (random sampling, number of samples to be picked is based on "Adjusted sample size" column in sample table above.

 

Record IDGroup
122
423
927
724
1328
2132
2532
2232
2432
2932
3032
3132
3537

 

It is a very Excel driven idea as I am not familiar with Alteryx. If I can by-pass the sample table creation step and get to final sample output that would be great. 

 

Any idea/suggestions would be greatly appreciated.

 

 

2 REPLIES 2
danilang
19 - Altair
19 - Altair

Hi @tww 

 

Here's one way to do it.

 

danilang_0-1679751170970.png

The bottom branch calculates the group totals, the grand total and the number of records selected from each group.  This is summarized into the Current Sample.  The portion in the container allocates one additional row to each of the largest groups until the total number sampled is equal to the your target.   In the top branch a random number is added to each row and the rows are sorted by group and rand to scramble the item in the group.  Then the top records are selected from each group according to the calculated group sample size. 

 

This method works with your sample data, but you might need to adjust it when applied to real data.  You'll need to modify the portion in the container if you want to distribute your additional records by another method, i.e. biased by group size.

 

Dan

tww
8 - Asteroid

Thank you so much Dan! This works great.

Labels