#SANTALYTICS Continues! Are we having fun yet? Be sure to share your thoughts with your social networks and use the #Santalytics hash tag .
The Elf thanks you all for participating in Part 1. In fact we are so excited over the level of participation, that we are upping the ante. Stay tuned on that. For now we are onto part 2 and it's going to get tricky.
With nice kids scattered across the globe, Santa can't be wasting any time this Holiday season! Use the Create Points Tool, of course, to identify where all our presents need to make it this year. We'll have to call on the elves to distribute them to each house, but let's see if we can't keep Santa from making any extra trips.
Determine the least number of trade areas we can distribute bunches of presents to while making sure that no two points in a distribution hub are more than 500 miles apart - remember, we only need to worry about including the nice kids who will be getting presents delivered this year. Once your distribution hubs are assigned, what's the minimum weight that we can use for every one of the hubs while making sure each kid gets a present from the classification of present that they earned? Santa will worry about how many reindeer to hook to the sleigh, but we need to let him know the minimum towage to account for!
Goal of Part 2:
Find a list of delivery "hubs" that include every nice kid - with no two kids in a hub being more than 500 miles apart or 250 miles from the central recipient (hub) location
Identify the minimum weight that be used to deliver presents (with respect to each present class in that hub) to every hub, excluding presents of 0 or null weight
Make sure you are using the data from our Part 1 Solution Blog post!
To clarify, Santa can use the Recipient locations (Address Database.xlsx from week 1 – attached to this post) as delivery hubs. See below for the approach the elves have been tinkering with this year:
While the elves’ approach will definitely make the trip a shorter one, Santa is always open to alternative approaches that will get him to the milk and cookies sooner!
To recap, the goal of Part 2 is to:
Find a list of delivery "hubs" that include every nice kid - with no two kids in a hub being more than 500 miles apart or 250 miles from the central recipient (hub) location
Identify the minimum weight that be used to deliver presents (with respect to each present class in that hub) to every hub, excluding presents of 0 or null weight
The elves have been working on an algorithm to find the minimum number of hubs to deliver all the presents. It is important that the team members assigned to a particular delivery hub don’t stray too far away from each other in order to maintain communication. (At least that is the official story, but everyone knows that the helpers don’t want to walk to much and therefore set the distance restraint within the hubs.)
The clustering method is called Constraint Hierarchical Present Clustering (CHPC) and however still in the development phase, the results look promising. The proposed list of 294 hubs is included.
CHPC method
In order to make this year’s strict deadline, a two stage clustering approach was use. In the first stage, all delivery locations within an 80 miles (10th distance percentile) radius are clustered together. After this, the second stage iteratively merges clusters together using a hierarchical clustering approach. At each iteration the distance for each cluster with respect to the other clusters is assessed and the cluster pair with the smallest distance is combined. Two different linkage methods were explored; single-linkage and complete-linkage. The first assesses the best next merge based on the minimum distance between elements of each cluster, whereas the second considers the maximum distance between clusters. The former resulted in a slight smaller number of delivery hubs (~275), but tends to make stretched out delivery regions. The complete-linkage approach was selected to be favorable, because it generated more smooth out hub regions. Other linkage functions might be considered in the future, but this will require additional investigation.
The maximum distance constraint within the hubs is kept by only merging those clusters of delivery locations that do not violate this restriction.
For a more in depth overview of the technique the R script used is attached ('CHPC.txt'; didn't accept a .R extension file).
Note: after checking the naughty/nice list twice, it seems that children with the same score could end up with different nice ratings. It would therefore be advised to prevent splitting children with the same score. The number of children per rating will probably not be equal anymore, but it will be a more fair.
Ho, Ho, Ho
I came up with 332 delivery hub rings (250 mile radius each), selected from a pool of 12000 rings (1 trade area around each "nice child" location). I used an iterative macro that selected the rings in order from highest concentration to lowest. During each iteration, I saved the current "top" delivery ring result and then removed the "nice" child locations that fell into that ring so they would not be used in the next iteration.
I then joined the selected delivery hub rings with the unique "nice child" locations contained within, then joinied and summed with the Present info to get the minimum weight calcualtions.
I shamelessly implemented @cor's awesome solution, coming up with 271 hubs. I take no credit other than being handy with copy/paste, and writing a sanity checker to make sure I was comfortable with the results.