Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Challenge: Monte Carlo simulation

Ned
Alteryx Alumni (Retired)

The last challenge was so popular I figured I should do another.  If you aren't a pretty solid Alteryx user, this one might be difficult.

 

There is a well known problem called the birthday problem.  It gives the slightly counter intuitive result that for any group of 23 people (about a soccer team) the chances are about 50/50 that 2 of them will have the same birthday.  Wikipedia has a good writeup of the problem: https://en.wikipedia.org/wiki/Birthday_problem  The way they arrive at the solution is correct, but it is slightly wrong because they are presuming there are 365 possible birthdays (there are 366.)  The attached module has the proper synthetic calculation.

 

Of course in the real world, not all birthdays have an even likelyhood of happenning.  (see http://www.nytimes.com/2006/12/19/business/20leonhardt-table.html) Attached is a Alteryx module that has the percent likelyhood of every birthday.  The challenge is to generate 1,000,000 or more random teams of 23 people and show what percentage of the teams have 2 people with the same birthday.  Hint - the simulated answer is slightly higher than the synthetic answer because some birthdays are more common.

 

This is an Alteryx challenge, not a programming challenge, so no fair using R.  I'll post my solution in a day or two.

 

 

11 REPLIES 11
chris_love
12 - Quasar

My philosophy with Alteryx is "find a way that works". It was a pretty hard slog getting there though in this case - I'm not sure my answer is the one Ned had in mind though.

 

For those tackling this problem who are struggling then I'd suggest you start posting answers for 1000 records, this is a bit easier and finding an answer that doesn't scale is perhaps a first step. I will post my starting attempts in a few days once people have had a go.

 

Anyway, I have an answer that runs for 1 million iterations in a reasonable (3 minute) timeframe, and my module is attached. The chart below shows (on a logarthmic scale) how my answer approaches the average over the first 50k iterations.

 

2015-10-13_08-09-04.jpg

AdamR_AYX
Alteryx Alumni (Retired)

Good fun!  My first attempt used a batch macro, which worked nicely but was going to take a little while to run 1 million teams (It did 10,000 in a minute).

 

I might have just it it running in the background, but given @chris_love had already posted that his ran in 3 minutes something I felt there must be a different approach.

 

So re-imagining the problem from a different angle I came up with the attached which runs the full 1 million teams in about 30 seconds on my machine. Man Happy

Adam Riley
https://www.linkedin.com/in/adriley/
chris_love
12 - Quasar

I gave in and looked - neat solution Adam.

MarqueeCrew
20 - Arcturus
20 - Arcturus

Ned,

 

I generated physical birthdates in the proportions set by the table.  Then I multiplied them by 24 million and then randomized them and assigned them to one million teams of 23.  Those with less than 23 distinct birthdates were then used for the duplicate birthday calculation.  The result was 50.66%.

 

BD 2.png

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
RodL
Alteryx Alumni (Retired)

So I'm assuming that you are looking for just the 'synthetic' solution and not a 'real world adjustment' for the randomization. Based on that, here's my solution.

I have been so tempted to look and Adam's for some hints, but didn't...now that I've posted, I will download and see where I could improve.  Smiley Happy

Thanks, Ned, for the challenge!

RodL
Alteryx Alumni (Retired)

And now that I've looked at Adam's solution, I didn't do TOO bad...but could not figure out getting the 'proportionate distribution' of birthdays for the 'real world' example. I forget that the Generate Rows tools will create the 'in between' records...very nice, @AdamR_AYX!

KaneG
Alteryx Alumni (Retired)

I better post my solution then, so that I can look at everyone elses.. I used Multi-row formulas to turn the birthday percentages into a Cumulative Distribution Function, and 2 Generate Rows tools to make the groups.

 

Image 001 - 20151014 - 081038.png

 

At first I used the advanced Join Macro to join the ranges, but it was taking a fair while for even 10000 samples, so I used a batch macro instead. With 50 batches (can be changed in the workflow constants), I got the workflow down to about a minute. I put the number of people in the sample in as a workflow constant as well, to play with that.

 

Very fun challenge...

SteveA
Alteryx
Alteryx

I'm late to the party but I finally had a go at the puzzle on the plane to/from the Colorado office this week.  I'm not convinced it's correct, but the exercise was Most Excellent Fun and brought back memories of struggling in my Combinatorics class twenty years ago...

 

<snip>

Ned points out that my solution is bogus and I have to agree...

 

Atabarezz
13 - Pulsar

Well a critical thing in montecarlo simulation is to draw multiple random numbers

considering correlations or covariances... Any ideas how to do that...

Labels