In the third post of this series, I pointed out the massive extent to which Germany underperformed in this year’s World Cup relative to our models’ predictions. Moreover, I also indicated that Germany’s performance was consistent with an emerging trend of defending World Cup champions failing to make it out of the Group Round in the subsequent World Cup (this has been the case in four of the last five World Cup tournaments, including the last three), a phenomena that has been dubbed the "World Cup curse".
In this post, we examine whether the “curse” can chalked up to random chance. Doing this is an exercise in statistical inference, as opposed to purely predictive analytics. The two activities are closely related, and some of the first modeling methods (such as linear and logistic regression) used for predictive analytics were primarily developed for the purpose of statistical inference. However, the concepts are actually not the same, since many predictive analytics methods do not lend themselves to statistical inference.
The predictive analytics tools in Alteryx are oriented to just that, developing models for making predictions as opposed to doing statistical inference. If there was not the possibility of a draw in association football, we could implement the statistical analysis needed to assess whether the World Cup curse can be attributed to random chance using the Logistic Regression tool in Alteryx. Since can (and do) end in a draw (resulting in three possible outcomes, rather than two), logistic regression cannot be used in this case. To address this multi-class case, the appropriate modeling method to use is known as a multinomial log-linear model (which is often called a multinomial logit model, but this can result in confusion since another, related, method is also frequently calling multinomial logit). Since the primary purpose of a multinomial log-linear model is for statistical inference, we currently do not have a tool to estimate this model type. As a result, we make use of the R tool in Alteryx to estimate the needed models.
Since our focus is the ability of the defending World Cup champion to advance out of the Group Round we limit our analysis to Group Round play in the 1998, 2002, 2006, 2010, 2014, and 2018 World Cup (all the tournaments played to date where a win is rewarded 3 points, a draw 2 points, and a loss 0 points). In these six World Cup tournaments, 288 Group Round matches were played, with 18 of those matches involving the champion from the prior World Cup. The 18 matches involving the defending champions allow for a test of the curse with an acceptable level of statistical power. Limiting the sample in this way made the tournament round and major tournament variables unnecessary for the analysis since all games are Group Round, World Cup matches.
To test the hypothesis that the World Cup curse cannot be attributed to random chance, two different multinomial log-linear models need to be estimated. The first includes a categorical variable that indicates whether the focal team is the defending World Cup champion (all defending champions are made the focal team in the analysis) along with the Elo ratings based measures and the home country/continent variable we used to develop our earlier models. The second model is identical to the first with the exception that the defending champion indicator variable is omitted from the model. Since the multinomial log-linear models are estimated via maximum likelihood, coupled with the fact that the second model is nested within the first, testing the hypothesis that the World Cup curse can be attributed to random chances involves what is known as a nested likelihood ratio test.
Figure 1 shows the workflow used to test the hypothesis that the World Cup curse can be explained by random chance, with the Configuration window showing a portion of the R code used to conduct the test. The models we created earlier gave us a strong sense of the nature of the relationships between the predictor variables, consistent with this learning, we included both the three Elo related measures (the difference in Elo ratings between the two teams, the Elo rating of the focal team and the Elo rating of its opponent), two-way interactions between the Elo measures, the square of the Elo rating difference between the two teams, and the categorical indicator of relative home advantage for the focal team. The full model also contains the defending World Cup champion indicator variable.
The difference in the deviance of the two models is 9.7667, and represents the test statistic for the hypothesis. The value of this test statistic follows a chi-squared distribution with two degrees of freedom. The p-value for the test is 0.0076, which allows us to reject the hypothesis that the World Cup curse is due to random chance using conventional statistical significance levels. Put another way, there is only a 0.0076 probability that the World Cup curse is due to random chance given the data from the past six World Cup tournaments.
At this point, we have seen that it is very likely that there is a World Cup curse at play. What we don’t know is the magnitude of the curse on match level win/lose/draw probabilities. To get a sense of this, we can make use of the partial dependence plot from the full model with respect to the average probability of a win, loss, or draw for the focal team depending on whether they are the defending World Cup champion, which is shown in Figure 2. The figure reveals that being the defending World Cup champion nearly doubles the probability of a match level loss, and decreases the probability of a win or draw by more than one-half, which are very large effects.
The upshot of this analysis is that the World Cup curse if very unlikely to be due to random chance, and its magnitude appears to be sizable on match level outcomes. Hopefully, this will provide some solace to Germany’s fans (like my colleague Oliver Wahner). However, what we don’t know at the moment is the underlying cause of the curse. There are a number of possibilities, but exploring those possibilities is a question that will need to wait until another time.
Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.
Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.