Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Data Science

Machine learning & data science for beginners and experts alike.
DrDan
Alteryx Alumni (Retired)

While it may not be much of "a thing" this year in the United States (given the US national team's failure to qualify), much of the rest of the world is eagerly awaiting today’s start of the 2018 FIFA World Cup. In preparation for the World Cup, I've been working with Oliver Wahner (an Alteryx sales engineer based out of our Munich, Germany office) to develop a predictive model to predict the probability of a win, loss, or draw for international association football (a.k.a., soccer) matches. In subsequent blog posts I’ll go into the details of how we developed our predictive models (yes, there are several) using Alteryx, along with how we simulated World Cup matches 100,000 times using a combination of Alteryx and R. Along the way, I’ll introduce you to the new Partial Dependency tool, which was just rolled out as part of the recently launched Laboratory District on the public Alteryx Analytics Gallery. However, with the recently concluded Inspire US conference and all, I’m running a bit behind schedule, and wanted to post our predictions, based on our models, of which countries will advance from the Group Round (the round robin matches) of the tournament to the knockout rounds before (yes, just before) the first match of the 2018 World Cup begins. We also want to point out several match-ups within some of the groups that our models predict will be closely fought. Finally, attached to this post is a YXDB file containing the predicted probability that each World Cup qualifying team will place first through fourth in their group, and the predicted probability they will advance to the knockout rounds.

 

The Teams Predicted Most Likely to Advance to the Knockout Rounds by Group

Group A:

  • Uruguay (81.6% predicted chance of advancing)
  • Russia (70.9% predicted chance of advancing)

Group B:

  • Spain (86.1% predicted chance of advancing)
  • Portugal (75.4% predicted chance of advancing)

Group C:

  • France (77.1% predicted chance of advancing)
  • Peru (54.1% predicted chance of advancing)

Group 😧

  • Argentina (79.1% predicted chance of advancing)
  • Croatia (57.7% predicted chance of advancing)

Group E:

  • Brazil (82.7% predicted chance of advancing)
  • Switzerland (58.2% predicted chance of advancing)

Group F:

  • Germany (90.4% predicted chance of advancing)
  • Sweden (41.7% predicted chance of advancing)

Group G:

  • Belgium (83.8% predicted chance of advancing)
  • England (83.1% predicted chance of advancing)

Group H:

  • Columbia (79.3% predicted chance of advancing)
  • Poland (64.1% predicted chance of advancing)

Looking over these predictions, several things can be observed. First, the team that should have the easiest time in its group round is Germany, followed by Spain. Second, Brazil is predicted to have a somewhat harder time than one might expect given their world ranking, but this is because Group E appears to be comparatively more balanced than most other groups from top to bottom (the same is true of Group H). Third, and in contrast to the last observation, Groups B (Spain and Portugal) and G (Belgium and England) seem to be “top heavy” with two teams expected to be dominate over the other two teams in the group (woe to Iran and Morocco in Group B and Panama and Tunisia in Group G).

 

Potentially Closely Fought Battles

With the exception of Belgium and England in Group G (which we predict will be a close battle for first place), the most hotly contested battles will be between second and third place in Groups C, F, and to a lesser extent D. To advance from the group round to the knockout rounds, a team must place either first or second in their group, so the difference between second and third place is the difference between “going on, or going home”. In Group C, Peru has a 54.1% predicted probability of advancing, while Denmark has a 53.6% predicted probability of advancing. In the head-to-head match between Peru and Denmark, we predict the probability that Denmark will win is 37.0%, that Peru will win is 34.7%, and the predicted probability of a draw is 28.3%. All in all, this should be a very interesting game.

 

Group F has an even closer predicted battle between Sweden (with a 41.7% predicted probability of advancing) and Mexico (with a 41.5% predicted probability of advancing). In terms of the head-to-head matchup between the two teams, our models predict that the probability Sweden will win is 38.8%, that Mexico will win is 33.5%, and that the match will end in a draw is 27.7%. To make things more interesting, this group also has the team with the highest predicted probability of advancing for the team with the lowest predicted probability of advancing within their group (South Korea, which has a predicted probability of 26.5% of advancing, the next best “fourth place” team in a group with respect to their predicted probability of advancing is Japan in group H with a 23.0% predicted chance of advancing). As a result, the real, and very close, competition in Group F will be for second place.

 

Group D features Iceland, which is the smallest country on a population basis (with an estimated population of just over 350,000 people) to qualify for the World Cup. Our models predict that Iceland’s probability of advancing is 47.4%, so they are positioned to potentially give Croatia a real run for second place in the group. In terms of the head-to-head matchup, our models predict that the probability that Croatia will win is 38.0%, that Iceland will win is 32.0%, and that the match will end in a draw is 30.0%.

 

What Follows?

The next blog post in this series will cover the creation of the match level win/lose/draw probability models, starting with a discussion of selecting predictor variables, and moving through model comparison and assessment.

Dan Putler
Chief Scientist

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Comments
NeilR
Alteryx Alumni (Retired)

My money is on Russia

TELEMMGLPICT000166228866_trans_NvBQzQNjv4BqC9PogZUtSpqAqO-tnweStaONoWI7TL-VyY5XInq8C5s.jpeg

wade12
8 - Asteroid

thanks for sharing this dan, much appreciated.

my alteryx model gives a far more accurate prediction -> probability that ireland will win the world cup = 0.

looking forward to your next blog.

 

yjd
7 - Meteor

Hi Dan!

 

This is a very interesting post! I’m doing something VERY SIMILAR in my workplace so I’m following this series of blogpost closely! Thank you for sharing!

OliverW
Alteryx Alumni (Retired)

Tough hit yesterday on our predictions with some of the fav teams failing to deliver, meaning the Argentinian draw against Iceland, the Brazil draw against Switzerland and the hit against "my" Germans with the loss to Mexico.

 

For these games, we were saying an approx. 12% chance of Germany losing to Mexico, so approx. 1 out of 10 games, guess it just was that game yesterday. We were actually calling a 17% for a draw of Brazil against Switzerland. Quite interesting for the Argentinian game against Iceland, we were actually on a good call for the draw, as we called an approx. 30% chance of a draw on this one.

 

With these more unexpected results, the outcomes of these groups and therefore also the possible round of 16 matchups have really been shaken up, think about for example Argentina not finishing first in their group. they can then easily face power house France in the round of 16. Also for example think about Germany (already far away from 1st group place with that loss) or Brazil not winning their group, that will mean those 2 major favorite teams will clash also in round of 16 (if they will even now make it out of their respective groups). "Funniest" thing will be, if both of these 2 in the end would just finish 2nd in their group, then they will stay away from each other in the round of 16, but not in the way they expected :)

 

SimonH
Alteryx Alumni (Retired)

Good thread and will be interested to see how the group round ends in comparison.  At a high level, it seems that quite a  few matches are much closer than expected.  How do you factor in "heart" (Iceland) and "overconfidence" (Germany) into the model?