Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Monte Carlo simulation: population adjusted

jonathanogrady
8 - Asteroid

Hello fellow Alteryx fans!

 

I'm trying to get to grips with the new simulation tool and I'm stumped.

 

I have a relatively simple problem and believe that the tool should be perfect, but I need the help of a kind soul to point me in the right direction as to how to set it up.

 

I have attached some dummy data of 467 records. Each record is classified, 1, 2 or 3. I then have a set of five different results, A through E.

 

I would like to analyse the frequency of the results with respect to each class, however my sample classes don't reflect the population average.

 

In my data, classes 1 through 3 are 25%, 44% and 31% respectively. In the population, I know that these should be 10%, 44% and 46% respectively.

 

I would like to create a Monte Carlo simulation where the classes are bound in the frequency according to the population, i.e. 10%, 44% and 46%.

 

I would then like to randomly sample results to determine the frequency in the population +/- level of confidence.

 

I was hoping that this would not be overly involved!

 

Can anyone point me in the right direction?

 

Best wishes,

Jonathan

 

3 REPLIES 3
patrick_digan
17 - Castor
17 - Castor

@jonathanogrady Here is what I would do for the simulations part if I'm understanding your ask correctly:

patrick_digan_0-1602859486903.png

My text input is just the classes and desired percents (.1, .44,.46). Then I use the formula tool to determine how many of the total simulations should be for each class. The total simulations can be adjusted in the config panel. I just chose 10,000 (randomly).

patrick_digan_1-1602859574101.png

Then I use the generate rows tool to create all of the simulated rows. So I'll have 10,000 rows in my example. Then I join the number of records in the sample data, by class. Then I randomly assign each simulation to 1 of the items for that class in the sample data. For example, each of the 1,000 simulations for class 1 are randomly assigned to 1 of the 116 sample points in the sample data. Then I join the result and recordID for that particular item. So the final result is 10,000 random simulations with the correct class split. 

 

Hope that helps! For what it's worth, I generally use @jdunkerley79 's alteryx abacus formula add ins for random numbers. In his latest files, he has a function called random() which you can seed using randomseed(seed) to get repeatable random numbers when needed. The attached solution using alteryx out of box function rand() is not repeatable each time.

 

jonathanogrady
8 - Asteroid

Patrick,

First of all, thank you very much for your comprehensive reply.

I see that you don't use the simulation tool it all… Before existed, I would have approached the problem your way, though probably not arrived at such an elegant solution.

Out of interest interest, does the simulation tool not handle problems like this?

Many thanks once again for giving this your time. Very much appreciated.

Best wishes,

Jonathan

patrick_digan
17 - Castor
17 - Castor

@jonathanogrady For me it's about speed and control.

 

Speed: Since the simulation tool is R based, there is a slow handoff back and forth between alteryx and R. I just sampled 1,000,000 from 4 letters and the R simulation tool took 17 seconds. Generate rows and join took 4 seconds. The R code itself is pretty fast, it's just the handing data back and forth to alteryx that is a killer.

 

Control: Since it's a built in macro, the tool has limited flexibility. I don't believe 1 simulation tool can handle your use case where you want to maintain that desired split between classes. Using regular alteryx tools seems to afford you more control to tweak the process to your specific use case. 

Labels