Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

How to do a stratified sample in Alteryx?




I am trying to do a stratified sample based on a column. For instance, out of my population of 1 million records, I want it to look like this after sampling:


Red - 100

Blue - 200

Green - 50


And the color type is a column from a larger dataset. I would also like the counts in each stratum to be random. Any ideas?


Hi, Joanna. Interesting problem. I'm not sure if this satisfies the strictest definition of randomness, but here is a solution that might work for you using two random numbers and comparing them to one another to determine whether or not to include them in the sample. This method ensures you get a different random count for each column.




Hi @joanna1 


Based on my Interpretation of your requirements, here's a possible way to go about this




The 2 controls on the Generate data set container just generate 1M records with random values assigned to one of 3 groups, also at random. The DataRow column is the unique key in this list. I generated 1M records to to ensure that this method would run in a reasonable amount of time.  


The real work start after this.  The Stratified quantities input contains the number of records that you want in the final output.  Obviously, you can increase these quantities. I kept them small to be able to show the results in one image




After joining this with the main data on Category, the Random SortKey formula tool generates a random number for each data row. The data is then sorted by Category and Sortkey, giving a list grouped by category and randomized within each category.  The Multi-row tool generates a unique ExtractID for the data in each category.  The filter pulls out all the rows where ExtractID is less than or equal to the required quantity. 


After running for about 3 seconds on my machine, you get the following results




You can see that we get quantity required from the Stratified Quantities input with random data rows pulled from each Category


If this isn't what you're looking for, leave a note with a clarification and I'll see what I can do.