Hi all,
I am trying to use the Random % Tool in Alteryx to run random sample. These samples have to be re-performable, so I will use the deterministic output configuration which works well. The only thing I've noticed about this is that this tool outputs results in record chronological order, not generation order. Ideally, I would like a way to know the exact generation order. This would come in handy if I needed to make replacements from my population, so I know I am picking the next statistically random record.
I've attached a workflow as an example.
Solved! Go to Solution.
Thanks for your reply, @Felipe_Ribeir0. I think I am looking for something more dynamic. Please see the example below of true generation order. I'm not sure if this would be some sort of macro solution, but I do not have much experience with macros.
(I've deleted the old workflow and attached the new one to the original post for clarity).
Hi @Felipe_Ribeir0, I altered my original post a bit. I'm looking for a more dynamic solution. Thanks!
Hi @jkanzler
Try the Simulation Sampling tool from the Prescriptive palette. Configure it to Sample from data.
I am curious to know why the sample order is important, though.
Dan
Hi @danilang this is exactly what I was looking for, thanks! I had one follow up question about the configuration of the tool. There is a "Chunk Size" parameter and I'm not totally sure what this does. I looked on the Alteryx website and saw the following description:
Is this saying the chunk size can potentially omit records from the random sample if the chunk size is set too low? It also seems like this may just be for stratified sampling; I only intend to use this tool for simple/random sampling, not stratified sampling. Please let me know if you have any more insight into the best configuration for the Chunk Size parameter.
Sample order is important for us because sometimes we need a random of 25, but we anticipate needing to replace some of those for miscellaneous reasons. As we are unsure about the final number of records we will have to replace, we just run a larger random of 50 and then go down the list, replacing as needed. If our random was spit out in chronological order, we would be unrepresentatively favoring selections earlier in the chronological order.
Thanks!