Alteryx Designer Desktop Discussions

tww · ‎03-24-2023

Hi all,

I am new to Alteryx.

I am trying to achieve the following:

1) To read in input data

2) To read in sample size = 13

3) To create sample table

4) To pick samples from each group based on the number listed in the sample table

Input data

Record ID	Group
1	22
2	22
3	23
4	23
5	23
6	23
7	24
8	27
9	27
10	27
11	28
12	28
13	28
14	28
15	32
16	32
17	32
18	32
19	32
20	32
21	32
22	32
23	32
24	32
25	32
26	32
27	32
28	32
29	32
30	32
31	32
32	32
33	37
34	37
35	37

Sample size: 13

To create a sample table

1) To read the sample size = 13 (manual input)

2) To count the number of records per each group

3) To calculate the % of no of records per group over the whole population (higher the percentage, more sample would be selected from that group)

4) To calculate the number of samples to be picked for each group. Each group must have at least one picked, no matter what the percentage is.

5) To pull addition samples from the ones with the highest %, to get to the required total sample size (in this case 13).

	Group	Count	% population	Calculated sample size	Adjusted sample size
	22	2	0.057142857	1	1
	23	4	0.114285714	1	1
	27	3	0.085714286	1	1
	24	1	0.028571429	1	1
	28	4	0.114285714	1	1
	32	18	0.514285714	6	7
	37	3	0.085714286	1	1
Total		35		12	13

Sample output (random sampling, number of samples to be picked is based on "Adjusted sample size" column in sample table above.

Record ID	Group
1	22
4	23
9	27
7	24
13	28
21	32
25	32
22	32
24	32
29	32
30	32
31	32
35	37

It is a very Excel driven idea as I am not familiar with Alteryx. If I can by-pass the sample table creation step and get to final sample output that would be great.

Any idea/suggestions would be greatly appreciated.

danilang · ‎03-25-2023

Hi @tww

Here's one way to do it.

The bottom branch calculates the group totals, the grand total and the number of records selected from each group. This is summarized into the Current Sample. The portion in the container allocates one additional row to each of the largest groups until the total number sampled is equal to the your target. In the top branch a random number is added to each row and the rows are sorted by group and rand to scramble the item in the group. Then the top records are selected from each group according to the calculated group sample size.

This method works with your sample data, but you might need to adjust it when applied to real data. You'll need to modify the portion in the container if you want to distribute your additional records by another method, i.e. biased by group size.

Dan

tww · ‎03-27-2023

Thank you so much Dan! This works great.

Alteryx Designer Desktop Discussions

Sampling question

Re: How to select columns dynamically using number...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula