Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Random % Tool - Output in generation order

jkanzler
6 - Meteoroid

Hi all,

 

I am trying to use the Random % Tool in Alteryx to run random sample. These samples have to be re-performable, so I will use the deterministic output configuration which works well. The only thing I've noticed about this is that this tool outputs results in record chronological order, not generation order. Ideally, I would like a way to know the exact generation order. This would come in handy if I needed to make replacements from my population, so I know I am picking the next statistically random record.

 

I've attached a workflow as an example.

 

 

 

5 REPLIES 5
Felipe_Ribeir0
16 - Nebula

Hi @jkanzler 

 

Please see if it works for you.

 

Felipe_Ribeir0_0-1666296603445.png

 

jkanzler
6 - Meteoroid

Thanks for your reply, @Felipe_Ribeir0. I think I am looking for something more dynamic. Please see the example below of true generation order. I'm not sure if this would be some sort of macro solution, but I do not have much experience with macros.

(I've deleted the old workflow and attached the new one to the original post for clarity).

jkanzler
6 - Meteoroid

Hi @Felipe_Ribeir0, I altered my original post a bit. I'm looking for a more dynamic solution. Thanks!

danilang
19 - Altair
19 - Altair

Hi @jkanzler 

 

Try the Simulation Sampling tool from the Prescriptive palette.  Configure it to Sample from data.  

danilang_1-1666525980766.png

 

I am curious to know why the sample order is important, though.

 

Dan

jkanzler
6 - Meteoroid

Hi @danilang this is exactly what I was looking for, thanks! I had one follow up question about the configuration of the tool. There is a "Chunk Size" parameter and I'm not totally sure what this does. I looked on the Alteryx website and saw the following description:

  • Select sampling mechanism: Monte Carlo / Simple Sampling or Latin HyperCube / Stratified Sampling. For stratified sampling from data, the maximum strata size is determined by the choice of chunk size.
  • Chunk size: The maximal size of data to evaluate at a time. This can be used to avoid R's in-memory processing limitation. For stratified sampling from data, this is also the maximal size of the strata.

Is this saying the chunk size can potentially omit records from the random sample if the chunk size is set too low? It also seems like this may just be for stratified sampling; I only intend to use this tool for simple/random sampling, not stratified sampling. Please let me know if you have any more insight into the best configuration for the Chunk Size parameter.

 

Sample order is important for us because sometimes we need a random of 25, but we anticipate needing to replace some of those for miscellaneous reasons. As we are unsure about the final number of records we will have to replace, we just run a larger random of 50 and then go down the list, replacing as needed. If our random was spit out in chronological order, we would be unrepresentatively favoring selections earlier in the chronological order.

 

Thanks!

 

Labels