Join the Alteryx Community’s Maveryx Summer Cup event! Compete, network with others, and earn your gold through a series of challenges from July 24th to August 11th. Learn more about the event here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Monte Carlo simulation sampling

jonathanogrady
8 - Asteroid

Dear fellow Alteryx fans,

I would like to create a Monte Carlo simulation using the simulation sampling tool and I'm a little confused how to set it up. I have anonymized my data so that we can discuss the issue, and hope that by working this through on the forum it will be of help to others.

The example is a little contrived, so please bear with me!

I have three customer profiles. Assume my website serves each of these profile groups in proportion. I have a dataset containing 250,000 rows. I have analysed these customers coming from four different channels (A, B, C and D) and looked at the relationship between the number of page visits versus the total revenue. Specifically, I wish to examine the percentage of page views by channel versus the percentage of revenue by channel.

Please find attached a simple tableau representation of the full data, this details:

Page visits by channel in terms of the three profile groups
revenue by channel in terms of the three profile groups
performance, i.e. the percentage of total revenue divided by the percentage of total page visits

We can see that for the entire dataset, in channel A customer profile group 1 outperforms by 11%, but group 2 underperforms by 7%.

I would like to use the Monte Carlo simulation to construct levels of confidence around these percentages. I'm thinking that one way we can do this is by taking a sample of data and making the same calculation, and then repeating this exercise again and again. I would expect the mean outperformance in channel A for customer profile group 1 to be 11%, but what's the standard deviation around this? This is where the simulation tool would be useful if I knew how to use it!

Hopefully somebody out there will find this interesting!

Very grateful for your help.

Best wishes,

Jonathan

Monte Carlo simulation.JPG



6 REPLIES 6
jdunkerley79
ACE Emeritus
ACE Emeritus

Would something like this work for you:

2018-12-07_18-42-17.png

Basically, this uses generate rows to make a set of simulations. Within each simulation, it creates a set of trials and then picks random data points from the input data.

Finally, it creates percentages for each simulation and then averages and std dev over the simulation set

 

Sample attached for you to look at

jonathanogrady
8 - Asteroid

Aurora,

Super kind of you to take the time to reply to my question.

It looks like a very comprehensive answer, though avoids the simulation tool… No problem there, just need to get the job done!

I will go through your answer in more detail and revert if I have any questions. In the meantime, thank you kindly.

Best wishes,

Jonathan

acpark
5 - Atom

Thank you both for the interesting problem and example.

 

I have one question - In the Formula tool, is there any particular reason you used:

 

RandInt([RowCount]-1) + 1

 

Instead of picking a random number between 1 and 242,095 for RecordID?

 

The latter seems to make more sense than picking a RecordID between 0 and 1 for the first trial, between 0 and 100 for the 100th trial, between 0 and 10,000 for the 10,000th trial, etc.

 

Does the approach in the screenshot bias the selection towards the first 10,000 data points in the sample dataset, vs. considering the full set of 242,095 sample data points?

 

Thank you

jdunkerley79
ACE Emeritus
ACE Emeritus

Hi,

 

The `[RowCount]` is the count of rows from the input data side and is a constant 242,095. It should be uniformly distributed over the set of input rows.

 

James

 

 

flying008
14 - Magnetar

Hi, @jdunkerley79 

 

Dear master, I send a private message to your inbox before 3 months ago, so could you please see that if you have free time ?

acpark
5 - Atom

Thanks for the reply!

 

I think it may be a small typo in the workflow - the "Count" column is the constant you refer to; "RowCount" is the column identifying each trial within a simulation

 

Either way, thank you for clarifying that I was thinking about it the right way!

Labels