Gwilym and Ben both work as Consulting Analysts for Alteryx partner The Information Lab. Earlier this year, Gwilym decided he wanted to visit all the locations on the London Monopoly board in a single day. In this blog, we look at how Gwilym could have used Alteryx to optimise his route planning using a few different optimisation strategies. These methods can be applied to all kinds of route planning use cases, such as a retailer delivering packages or a doctor planning patient home visits. All Alteryx content shown throughout this post can be downloaded here.
The canonical route on the London Monopoly board takes you on a journey from Old Kent Road to Mayfair via 24 other major London streets or destinations. If you plotted this route on a map, from Old Kent Road to Whitechapel Road to King’s Cross Station and so on, it comes out at roughly 58km. That’s a lot of walking; it would take around 12 and a half hours, which would be good going for most of us mortals!
However, if you actually look at the route, you’ll instantly see that this isn’t the most optimal route - there’s a lot of criss-crossing and back and forth, especially when moving from streets to stations.
In this blog we’ll outline a number of methods that you can use to try (optimisation is never 100%!) and find the shortest walking route between these locations.
Before we can try to solve this problem, we need some data. When trying to solve a route optimisation problem like this, you need to know the distances between all of the different locations.
Fortunately, with Alteryx, assuming you have a list of these locations, this is a fairly simple process.
In the image below you will see how you can use the Append Fields tool in order to create a data set which contains all from/to combinations of the different locations. You’ll also see a filter tool right afterwards because we are never going to want to go from and to the same location, so these rows can be removed.
Once we have created our data set with all of the different combinations of locations, we can then generate the actual walking or driving distances between them locations. There are a number of ways to do this. One is to use the TomTom data, which can be purchased as an add-on with Alteryx Designer; alternatively you online resources, such as the Google Maps API or TravelTime Platform, which we used in our case.
The workflow we have used to generate our distance matrix is included in the workflow package shared with you. If you want to run this workflow you must download the TravelTime Platform macros for Alteryx which we can be found here. You’ll also need to register for an API key.
Brute Force Search
This is probably the most intuitive approach - why not just calculate all of the different possibilities, and then from this identify the best route? Maths and computational power is the reason why.
The number of possible routes to try, via brute force search, is the factorial of the number of locations in your data set. So in our case, it’s 26!, or 26 x 25 x 24 x 23 x … and so on. That means there are 403,291,461,126,606,000,000,000,000 different possible routes around the London Monopoly board... or to put it in words, four hundred and three septillion, two hundred and ninety-one sextillion, four hundred and sixty-one quintillion, one hundred and twenty-six quadrillion, six hundred and six trillion.
Basically, once you go beyond a very small number of locations, brute force search is increasingly impractical. So in this case, we’re not even going to attempt this method, because our laptops would probably melt.
Compass Directions
One way in which we could look to optimise our route is by travelling through our locations in order, either horizontally (west to east) or vertically (south to north).
If we go from our most southerly point (Old Kent Road) and travel north with each stop, we create a route that is ~54km (worth noting at this point the route given on the monopoly board is ~56km, so we are already starting to optimise!).
Better still, if we were to start at our most westerly point, Marylebone station, and go from west to east, we’d get a total distance of 40.5km, or a little under nine hours of walking.
Nearest Neighbour
The nearest neighbour method is an approximation algorithm that allows us to generate an optimised solution quickly.
The process involves highlighting any location to start at, before then moving to the next closest location.
You would then move from that location to the next closest location (that you haven’t yet visited.
...and that keeps going until you’ve hit all locations.
When we tried this method with all possible starting locations, the best route came out to ~26.5km and the workflow generated that approximate solution in 70 seconds - not bad considering that it’s over 30km shorter than the original Monopoly board route and that the number of possible routes makes brute force search impossible!
As the nearest neighbour method is an example of an iterative problem, which means we continually repeat the same steps with a changing input value. In Alteryx, this is done with an iterative macro.
In order to test all of our different starting locations, we can wrap our iterative macro within a batch macro to simply repeat our nearest neighbour process for each location.
At this point, we will apologise that we can’t dive into the details of these workflows, otherwise this would end up as a pretty long blog, but again remember you can access and investigate all the Alteryx assets via the link shared at the top of this blog. You can also ask questions and post comments at the bottom of this blog!
Genetic Algorithm
Another optimisation strategy is running a genetic algorithm, which is a method based on evolution and mutation. You start out with a population of chromosomes made up of genes, and then you select the best chromosomes, apply crossover, and apply mutations over multiple generations until the fittest chromosome survives. In our example, each chromosome is a possible route made up of 26 different genes, which are our locations. The algorithm will generate a population of possible routes at random, select the routes with the shortest distances between the locations, and then take those into the next generation to apply crossover (taking bits of one route and splicing it with another route to get a new route) and mutation (switching a couple of locations around within the same route), then do the same process all over again. And again. And again, for however many generations we choose.
As a quick example, let’s say we’ve got six locations on a map, and we’ve measured the distances between them.
Each location is a gene. The genes can be organised into chromosomes of six genes, representing a possible route between the locations. If we randomly draw up a lot of different routes, this is our population of chromosomes. We’ve got a population of three chromosomes here, but in an actual genetic algorithm, you’d take set a starting population of hundreds, potentially thousands, of chromosomes.
In each generation, the algorithm calculates the best chromosomes, and the fittest survive to the next generation. Here, the best chromosomes are the routes with the lowest total distances, so we keep a proportion of the short distances and discard the longer distances.
The chromosomes that survived get bred and their genes are crossed over. We take a subset of genes from one parent and cross it with the remaining genes from the other parent to create a child.
The new children are passed into the second generation, as well as some other chromosomes being passed down with small mutations, and the process continues for hundreds or thousands of generations.
In our case, we used the Python tool and adapted the code from this really helpful blog by Eric Stoltz to work for our data. Playing around with the population size, selection, mutation, and generation settings will change the results; the higher the initial population and the more generations they go through, the more optimised the solution will be, but the longer the process will take. It also means that the solution may well be different each time you run the script. This means that genetic algorithms aren’t necessarily the most consistent way of getting results, but they’re a useful tool for getting an approximate idea.
Here’s a small picture of the end of the Python tool from an instance when it returned 27.6km. There’s a lot of code in there! Please check out the whole thing by downloading it from this link.
The best result from the genetic algorithm that we found was 25.9km, shaving another 600m off the nearest neighbour method.
Genetic Algorithm + Manual Tweak
Finally, in a satisfying illustration that there’s still a place for the human eye alongside complicated algorithms, the best route we found overall was by taking one of the genetic algorithm suggestions, plotting it on a map, and noticing that the route went out of its way to visit a couple of locations. We simply swapped those locations around, leaving everything else the same, and ended up with a 25.3km route that would take just under five and a half hours to walk:
But as optimisation, by definition, is a way of finding the best possible route within your constraints, it’s possible that there are even shorter routes between the Monopoly board locations that we haven’t found yet. Let us know if you find one!
Thanks for reading!
Ben and Gwilym
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.