This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
With the Rugby World Cup upon us, I took to the challenge of predicting the results of all the matches. We got our hands on over 10 years’ worth of international test data from our friends over at Opta and set about building a predictive model. Following the methodology that is nicely outlined in the How to Become a Citizen Data Scientist series, a simple workflow leveraging a linear regression model was born.
For those curious, these are the general steps I took (if you’re looking for the exact features we used for the predictions, you’ll have to wait until we see if we’re right):
Take nearly 1000 rugby match XML files and read them in using a wildcard
Parse these out using the XML Parse tool and split each match into two rows, one for each team involved.
Prep the data to create relevant fields around points difference and match events (e.g.: tries and penalties scored).
Infer home, away or neutral ground, based on the match location.