Data Science

Machine learning & data science for beginners and experts alike.
DrDan
Alteryx Alumni (Retired)

While it may not be much of "a thing" this year in the United States (given the US national team's failure to qualify), much of the rest of the world is eagerly awaiting today’s start of the 2018 FIFA World Cup. In preparation for the World Cup, I've been working with Oliver Wahner (an Alteryx sales engineer based out of our Munich, Germany office) to develop a predictive model to predict the probability of a win, loss, or draw for international association football (a.k.a., soccer) matches. In subsequent blog posts I’ll go into the details of how we developed our predictive models (yes, there are several) using Alteryx, along with how we simulated World Cup matches 100,000 times using a combination of Alteryx and R. Along the way, I’ll introduce you to the new Partial Dependency tool, which was just rolled out as part of the recently launched Laboratory District on the public Alteryx Analytics Gallery. However, with the recently concluded Inspire US conference and all, I’m running a bit behind schedule, and wanted to post our predictions, based on our models, of which countries will advance from the Group Round (the round robin matches) of the tournament to the knockout rounds before (yes, just before) the first match of the 2018 World Cup begins. We also want to point out several match-ups within some of the groups that our models predict will be closely fought. Finally, attached to this post is a YXDB file containing the predicted probability that each World Cup qualifying team will place first through fourth in their group, and the predicted probability they will advance to the knockout rounds.

 

The Teams Predicted Most Likely to Advance to the Knockout Rounds by Group

Group A:

  • Uruguay (81.6% predicted chance of advancing)
  • Russia (70.9% predicted chance of advancing)

Group B:

  • Spain (86.1% predicted chance of advancing)
  • Portugal (75.4% predicted chance of advancing)

Group C:

  • France (77.1% predicted chance of advancing)
  • Peru (54.1% predicted chance of advancing)

Group 😧

  • Argentina (79.1% predicted chance of advancing)
  • Croatia (57.7% predicted chance of advancing)

Group E:

  • Brazil (82.7% predicted chance of advancing)
  • Switzerland (58.2% predicted chance of advancing)

Group F:

  • Germany (90.4% predicted chance of advancing)
  • Sweden (41.7% predicted chance of advancing)

Group G:

  • Belgium (83.8% predicted chance of advancing)
  • England (83.1% predicted chance of advancing)

Group H:

  • Columbia (79.3% predicted chance of advancing)
  • Poland (64.1% predicted chance of advancing)

Looking over these predictions, several things can be observed. First, the team that should have the easiest time in its group round is Germany, followed by Spain. Second, Brazil is predicted to have a somewhat harder time than one might expect given their world ranking, but this is because Group E appears to be comparatively more balanced than most other groups from top to bottom (the same is true of Group H). Third, and in contrast to the last observation, Groups B (Spain and Portugal) and G (Belgium and England) seem to be “top heavy” with two teams expected to be dominate over the other two teams in the group (woe to Iran and Morocco in Group B and Panama and Tunisia in Group G).

 

Potentially Closely Fought Battles

With the exception of Belgium and England in Group G (which we predict will be a close battle for first place), the most hotly contested battles will be between second and third place in Groups C, F, and to a lesser extent D. To advance from the group round to the knockout rounds, a team must place either first or second in their group, so the difference between second and third place is the difference between “going on, or going home”. In Group C, Peru has a 54.1% predicted probability of advancing, while Denmark has a 53.6% predicted probability of advancing. In the head-to-head match between Peru and Denmark, we predict the probability that Denmark will win is 37.0%, that Peru will win is 34.7%, and the predicted probability of a draw is 28.3%. All in all, this should be a very interesting game.

 

Group F has an even closer predicted battle between Sweden (with a 41.7% predicted probability of advancing) and Mexico (with a 41.5% predicted probability of advancing). In terms of the head-to-head matchup between the two teams, our models predict that the probability Sweden will win is 38.8%, that Mexico will win is 33.5%, and that the match will end in a draw is 27.7%. To make things more interesting, this group also has the team with the highest predicted probability of advancing for the team with the lowest predicted probability of advancing within their group (South Korea, which has a predicted probability of 26.5% of advancing, the next best “fourth place” team in a group with respect to their predicted probability of advancing is Japan in group H with a 23.0% predicted chance of advancing). As a result, the real, and very close, competition in Group F will be for second place.

 

Group D features Iceland, which is the smallest country on a population basis (with an estimated population of just over 350,000 people) to qualify for the World Cup. Our models predict that Iceland’s probability of advancing is 47.4%, so they are positioned to potentially give Croatia a real run for second place in the group. In terms of the head-to-head matchup, our models predict that the probability that Croatia will win is 38.0%, that Iceland will win is 32.0%, and that the match will end in a draw is 30.0%.

 

What Follows?

The next blog post in this series will cover the creation of the match level win/lose/draw probability models, starting with a discussion of selecting predictor variables, and moving through model comparison and assessment.

Dan Putler
Chief Scientist

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Comments