Engine Works

ShaanM · ‎07-23-2018

Now that the dust has settled on what was up there as one of the most exciting and interesting World cups, we thought we'd conclude with our final post.

Well done to England, amazing achievement getting to the semi-finals and we very nearly brought football home – I suppose technically we didn’t say when it’s coming home 😉 Congratulations to France for winning and also to Croatia for reaching the final – did anyone predict that to happen?

For our model, we made two passes of data, one prior to the World cup and one post group stage. Our original prediction was for Germany to lift the trophy, followed by our second pass of Brazil – unfortunately, we were wrong on this occasion, but we were not the only ones. This World Cup has been full of surprises, for a neutral spectator it had it all from penalties, VAR to underdogs beating the favourites – what more could you ask for?

Nick and I always set out to bring predictions to non-data scientists, essentially to build a workflow model with rules and weights without any coding or scripting. The group stage worked quite well, taking historical data and applying it to our model. The knockout stage was a tricky call. When we built our second phase of the model we were trying to adjust based on how the current matches and teams were performing – maybe we got over-enthusiastic with the live weightings. However in hindsight to get a more accurate prediction we should have done this after each knockout phase/match and always use the most up to date data. This is key for any data challenge, in any business, the end result is only as good as the relevant data used. Out of date data, gives an out of date result.

Earlier this week I played around with different scenarios and re-ran each round of results. The model fared slightly better – interestingly the model changed the outcome to give greater probability for France getting to the final. But with Croatia not conceding many in the group stage, always winning at penalties and holding their nerves under pressure against England in the semi-finals, momentum had shifted and this became apparent when running through the model with current data. The data was crucial, but we were always working with historical data to begin with – in an ideal world to get the most accurate predictions we would want even more up-to-date data on the players – maybe their temperament, fitness and wellbeing which could be crucial factors. We created a model of weightings and averages for matches played, winning from behind, goals above/below the opposition, winning on penalties at different stages and of course Pythagorean, what else could we have missed…

World Cup winners’ curse? Maybe, looking at the trend it is hard to ignore:

France - Champions 1998, Group stage exit 2002
Italy - Champions 2006, Group stage exit 2010
Spain - Champions 2010, Group stage exit 2014
Germany - Champions 2014, Group stage exit 2018

Only Brazil in recent history broke the curse by winning in 2002 and making it past the group stage in 2006. As we look to the next World Cup maybe this is another factor we build into a new and improved model – so look out for a V3 CavMis model!

Some of our colleagues used data science techniques to predict, we went down a different path – the reality is predicting individual matches is tough. As with anything, practice makes perfect – Nick and I had great fun working on this and it has been great to hear from others on their own attempts at predicting the winner. I suppose now we can finally give our eyes a rest and enjoy the sunshine, thanks all.

- Nick Cavey and Shaan Mistry

Engine Works

The CavMis Model Retrospective