Challenge: Monte Carlo simulation
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The last challenge was so popular I figured I should do another. If you aren't a pretty solid Alteryx user, this one might be difficult.
There is a well known problem called the birthday problem. It gives the slightly counter intuitive result that for any group of 23 people (about a soccer team) the chances are about 50/50 that 2 of them will have the same birthday. Wikipedia has a good writeup of the problem: https://en.wikipedia.org/wiki/Birthday_problem The way they arrive at the solution is correct, but it is slightly wrong because they are presuming there are 365 possible birthdays (there are 366.) The attached module has the proper synthetic calculation.
Of course in the real world, not all birthdays have an even likelyhood of happenning. (see http://www.nytimes.com/2006/12/19/business/20leonhardt-table.html) Attached is a Alteryx module that has the percent likelyhood of every birthday. The challenge is to generate 1,000,000 or more random teams of 23 people and show what percentage of the teams have 2 people with the same birthday. Hint - the simulated answer is slightly higher than the synthetic answer because some birthdays are more common.
This is an Alteryx challenge, not a programming challenge, so no fair using R. I'll post my solution in a day or two.
Solved! Go to Solution.
- Labels:
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
My philosophy with Alteryx is "find a way that works". It was a pretty hard slog getting there though in this case - I'm not sure my answer is the one Ned had in mind though.
For those tackling this problem who are struggling then I'd suggest you start posting answers for 1000 records, this is a bit easier and finding an answer that doesn't scale is perhaps a first step. I will post my starting attempts in a few days once people have had a go.
Anyway, I have an answer that runs for 1 million iterations in a reasonable (3 minute) timeframe, and my module is attached. The chart below shows (on a logarthmic scale) how my answer approaches the average over the first 50k iterations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Good fun! My first attempt used a batch macro, which worked nicely but was going to take a little while to run 1 million teams (It did 10,000 in a minute).
I might have just it it running in the background, but given @chris_love had already posted that his ran in 3 minutes something I felt there must be a different approach.
So re-imagining the problem from a different angle I came up with the attached which runs the full 1 million teams in about 30 seconds on my machine.
https://www.linkedin.com/in/adriley/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I gave in and looked - neat solution Adam.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Ned,
I generated physical birthdates in the proportions set by the table. Then I multiplied them by 24 million and then randomized them and assigned them to one million teams of 23. Those with less than 23 distinct birthdates were then used for the duplicate birthday calculation. The result was 50.66%.
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
So I'm assuming that you are looking for just the 'synthetic' solution and not a 'real world adjustment' for the randomization. Based on that, here's my solution.
I have been so tempted to look and Adam's for some hints, but didn't...now that I've posted, I will download and see where I could improve.
Thanks, Ned, for the challenge!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
And now that I've looked at Adam's solution, I didn't do TOO bad...but could not figure out getting the 'proportionate distribution' of birthdays for the 'real world' example. I forget that the Generate Rows tools will create the 'in between' records...very nice, @AdamR_AYX!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I better post my solution then, so that I can look at everyone elses.. I used Multi-row formulas to turn the birthday percentages into a Cumulative Distribution Function, and 2 Generate Rows tools to make the groups.
At first I used the advanced Join Macro to join the ranges, but it was taking a fair while for even 10000 samples, so I used a batch macro instead. With 50 batches (can be changed in the workflow constants), I got the workflow down to about a minute. I put the number of people in the sample in as a workflow constant as well, to play with that.
Very fun challenge...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'm late to the party but I finally had a go at the puzzle on the plane to/from the Colorado office this week. I'm not convinced it's correct, but the exercise was Most Excellent Fun and brought back memories of struggling in my Combinatorics class twenty years ago...
<snip>
Ned points out that my solution is bogus and I have to agree...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Well a critical thing in montecarlo simulation is to draw multiple random numbers
considering correlations or covariances... Any ideas how to do that...
