We’ve extended Inspire Early Bird Pricing until March 1. Register now and enjoy 20% off conference passes and 10% off training passes. P.S. Don’t forget to bring friends! When you sign up for five or more tickets, you get an extra 20% discount on conference passes.
Learn more now.
With the Rugby World Cup upon us, I took to the challenge of predicting the results of all the matches. We got our hands on over 10 years’ worth of international test data from our friends over at Opta and set about building a predictive model. Following the methodology that is nicely outlined in the How to Become a Citizen Data Scientist series, a simple workflow leveraging a linear regression model was born.
For those curious, these are the general steps I took (if you’re looking for the exact features we used for the predictions, you’ll have to wait until we see if we’re right):
Take nearly 1000 rugby match XML files and read them in using a wildcard
Parse these out using the XML Parse tool and split each match into two rows, one for each team involved.
Prep the data to create relevant fields around points difference and match events (e.g.: tries and penalties scored).
Infer home, away or neutral ground, based on the match location.