Trying to do a random sample selection in Alteryx that is proportional based on each region or state. For example, I have 7 stores across the US (California, Montana, Wisconsin, Florida, Illinois, Rhode Island, and New York), but I have more sales volume in New York than Rhode Island. I want the sample selection to be weighted appropriately based on population of sales. Also need every location to have at least one sample selected. Please see example below for how I would select samples manually. Is there a way to do this in Alteryx?
We would select 25 samples across the 7 locations below.
Population
- California - 205 sales
- Montana - 101 sales
- Wisconsin - 49 sales
- Florida - 326 sales
- Illinois - 22 sales
- Rhode Island - 89 sales
- New York - 444 sales
Total sales = 1,236
Sample Selection (total of 25 samples with each state at least getting one sale selected):
- California - 4 sales based on 17 % of total population
- Montana - 2 sales based on 8 % of total population
- Wisconsin - 1 sale based on 4% of total population
- Florida - 6 sales based on 26% of total population
- Illinois - 1 sale based on 2% of total population
- Rhode Island - 2 sales based on 7% of total population
- New York - 9 sales based on 36% of total population
Hey @eurban
Built you a quick macro to help with this.
You can pass the data in the "I" macro input and the percentage of random sampling into the macro.
Hope this helps!
@gautiergodard
Thank you for your response! We are trying to use the macro and we were able to get how many samples we want per division, but it is not coming back with how many records we have specified in our table below. For example we have 6 samples we need for Austin, but it is coming back with 59. It should be a total of 50 samples, but is coming back with 364. Is there something specific we need to have for the control parameter and data field within the macro?
Hi @eurban ,
Here is one potential solution.
Summary process
1. Generate rows for each state (one row for one sales)
2. Pick one row for each state
3. For the rest of rows, generate random number to pick up the rest of rows (25 - 7 = 18)
4. Union the rows
Workflow
Output
Count by State
I hope this matches with your requirement.