Hello and welcome to the first blog post to follow on from my new video series, Will's Weekly Wonders, where I look to take data from my own personal life and interests and explore what is possible to do with it in Alteryx. I am a big football/soccer fan and five weeks ago I took on the challenge to see if I could predict whether the team I support, Reading FC, would make it into the playoffs to hopefully ease my nerves for the remaining league games.
I was devastated, the model successfully predicted that they would finish outside the playoffs in 7th position and end the season early for me and all other Reading fans. For those who are not familiar, Reading currently play in the EFL Championship which is the division beneath the Premier League. In this league, the top 2 teams will go up automatically to the Premier League, whilst those that finish 3rd – 6th go into the playoffs to compete for the 3rd spot in promotion. At the time of creating the model, Reading sat in 6th place and then proceeded to fall out.
The purpose of this was really to see what was possible to achieve in Alteryx when I had half a day spare, a hypothesis and a real passion that I could quantify in data. Overall, the model was a success, it correctly predicted the teams who would finish 1st and 2nd and get promoted automatically, as well as the 4 teams who are now going to compete in the playoffs over the coming weeks. That being said, when diving into all the predicted results, the accuracy does start to fall somewhat. However, the big success for me was how quickly I was able to build a Data Science framework within Alteryx and a platform to help me progress in any future sporting prediction endeavors.
So, onto the workflow itself. As you can see from the screenshot beneath, this was split up into 6 sections/macros, with each macro representing one of the teams pushing for the playoff positions and the trophy at the end. The data was sourced from a GitHub repository which contained all the results for the current season.
The first part of the workflow involved cleaning up the original data set with all information stored in one field with comma separated data. The key part to this was to get each result on to 2 separate lines, so that you could analyse the performance of each team individually when this would be fed into the model.
Tools used: Text to Columns, Dynamic Rename, Select, Record ID, Transpose, Filter
Cleaned output data.
Once the dataset had been cleaned and prepared it was time to start the process of feature engineering which is creating new variables from the raw data, based on my (self-professed) footballing knowledge.
The features that were created:
As you can see, not an extensive list and there is definitely scope to add more features in! The dataset was also further prepared, by providing information on the result of each fixture across each individual line item.
Tools Used: Formulas, Multi-Row Formula, Select
Output from Feature Engineering and further Cleansing.
I now wanted to supplement my original dataset with some additional data which I would be able to add in as features into the model. In recent years, Expected Goals (xG) has become an increasingly popular metric with which to analyse football games, so I have sourced this information from a dedicated website to append on.
The next step was to join the dataset to itself, so that for each fixture you could compare the main team’s performance, with that of the opposition. So that for instance, we could model how much a factor your own Form was vs the Opposition Form for each game.
I then also split the data into Results (the games that had already been played) and Fixtures (those which I wanted to predict), so that I had my sampled datasets ready to feed into the model and those which to feed into the prediction.
Tools Used: Input Data, Select, Find Replace, Join, Filter
Output from Data Blending and Sample Creation
The next part was to feed this into the Assisted Modeling tool and let the Machine Learning perform its magic. For clarification here, I have no Data Scientist background and have learnt everything I know since being at Alteryx and using the software in the last 2 years. So, for me, the Assisted Modeling comes in very handy!
For those who are new to the Intelligence Suite and Assisted Modeling, you go through 7 steps, which I will outline below...
I didn’t spend as long as I would traditionally like in the preparation steps here and could certainly have benefitted from further prototyping and iterating to progress to a more accurate model, however the focus was on producing a framework and generating some results for the latest fixtures!
Tools Used: Assisted Modeling, Transformation, Classification, Fit
I was now able to feed into the prediction the next fixtures in order to get a % likelihood of the 3 possible outcomes: Win, Draw or Loss. The model would then automatically select the value that had the highest % likelihood.
Tools Used: Predict, Select, Sort, Summarize, Transpose, Join, Filter
Output of the results that had now been predicted.
Blending the historical data with the predicted values then enabled me to produce a final table, with all predicted games taken into account alongside a view of the next round of fixtures for each team.
I used the Reporting tools within Alteryx in order to produce a final PDF table, to provide a much more visual representation of the outcomes in the model itself and ultimately produce something that is nice and easy to interpret for the final users. The ability to combine multiple reporting elements into the PDF report is crucial for this.
Tools Used: Formula, Multi-Row Formula, Join, Union, Summarize, Sort, Record ID, Filter, Table, Image, Chart, Layout, Render
This was a really fun project for me that allowed me to explore completely new datasets, build a model and test hypotheses in only a short amount of time! Whilst the model hasn’t proved to be 100% accurate, I never anticipated this to be the case and wanted to use it as a means to creating a framework for anyone else out there who would like to make predictions on sporting events.
I’m sure many people reading this could go on and create/develop this work substantially more and who knows, maybe for next season we might have the perfect model to predict championship football matches!
I’ve attached the workflow alongside each of the individual macros so that you can see the workings behind the scenes at every step of the way. Of course, the macros themselves are there purely for show in this instance and to create a nice visual workflow (this did all start as one long and messy workflow!). Note that you'll need Alteryx Intelligence Suite installed in order to run the workflow.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.