Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #103: Just another game?

ChristineB
Alteryx Alumni (Retired)

A solution to last week's Challenge has been posted HERE!

 

I learned the worst thing to say to a bunch of football fans: "The Super Bowl?  It's just another football game".   The looks of horror on my colleagues' faces when I said that are burned into my memory forever.  And let me warn you: don't say that when you have anything important to do because you will be subject to an hour-long debate about player stats, offense/defense match-ups, the importance of turnovers (and not just the apple flavored ones) and the proper way to make chili

 

Anyway, after this debate, I did what any data nerd would do: I took to the internet in search of datasets and fired up my Alteryx Designer to answer this question: Is the Super Bowl just another game?  I decided that I'd do a little experiment.  Using the Predictive tools and data from the New England Patriots's 2016 and 2017 seasons, I wanted to see how a linear regression model developed on regular season games (including post-season) performed when used to predict the number of points the Patriots would score during the Super Bowl. 

 

First, I downloaded data (source: here) for the New England Patriots for the 2016 and 2016 seasons (provided as inputs in the Start File), which required a bit of parsing to prepare for later use.  Then, I set out on some data investigation to begin my linear regression model development.  My approach (which may not be the same one you use in your modeling approach)  was to choose the four (4) variables from the "Score", "Offense" and "Defense" data categories with the most significant relationship to the variable "TM", which indicates the number of points the Patriots scored.  With my variables selected, I began the model creation.  My approach (which may differ from yours😞 develop the model on on values except for one pair of regular season games and the Super Bowl games.

 

What's the difference between your predicted values and actual values for your regular season games and Super Bowl games?  Is the Super Bowl just another game? 

 

Extra Challenge: How'd you do on this Jeopardy Category?  Admittedly, I was in good company with these contestants!       

patrick_digan
17 - Castor
17 - Castor

Here is my attempt! I leveraged some of what I learned from Challenge 18 (specifically @samjohnson)

 

Spoiler
Capture.PNG

I dynamically parsed the fields to columns. Based on correlation, I ended up using Offensive 1st Downs, Total Yards, Passing Yards, Defensive Turnovers, Expected Offensive Points, and Expected Defensive Points. I used PCA. I tried Linear Regression, Neural Network, Forest Model, and Gamma Regression. Neural Network had the lowest Root Mean Squared Error. 

 

 

nick_ceneviva
11 - Bolide

Solution is attached.  Played around with a couple different variables.  Because we are predicting points scored, I figured it would be best to solely look at the offensive stats as opposed to any defensive stats.  Something like Time of Possession would have been interested.

 

Also E-A-G-L-E-S. EAGLES!  Super Bowl Champs!

shanakag
6 - Meteoroid

 

 

Natasha
9 - Comet

Once again this week I find myself working with the data and topic I know nothing about so I stick to developing my model based purely on p-values.

 

Spoiler
I used only 3 variables Offense_PassY, Offense_TotYd and Offense_1stD because only they were statistically significant with a p-value <0.05 Which I think makes sense now, when I read @nick_ceneviva 's comment that points scored should be influenced by offence stats rather than defence.

The prediction for week 13 is way better than for the Super Bowl game.

Screen Shot 2018-02-06 at 22.45.17.png


WLL
7 - Meteor

A pretty terrible but repeatable bit of data prep, trying to combine the first 2 rows for the column headers wasn't necessary but I thought would be good if used in the future or for other uses!

 

Completed

AndyBate
8 - Asteroid

First Predictive attempt, really enjoyed this challenge.

 

 

jamielaird
14 - Magnetar

Here's my solution.

 

Spoiler
challenge_103.png
ggruccio
ACE Emeritus
ACE Emeritus

This was a lot of fun to put together!

NicoleJohnson
ACE Emeritus
ACE Emeritus

I may have gone a little overboard. Particularly since I was rooting for the Eagles. (For no other reason than I'm a Seahawks fan, and the Eagle is also a bird.)

 

Spoiler
I chose to solve this with a macro so that I could see which week of regular season play was best suited to the predictions made from the remaining weeks, as well as which combination of weeks was the best predictor of Superbowl performance (by excluding a particular week of anomalous performance). My iterative macro went through a linear regression model using all weeks except Superbowl + each progressive week based on iteration number, through the entire season... and then I found the minimum absolute value point differential for both the regular season performance as well as Superbowl performance to find my best predictive groupings. 

Based on results it appears that Week 15 was the most consistent with overall performance from 2016-2017, and the Superbowl was best predicted by excluding the results from Week 4.

Also, this workflow (with the addition of a couple unnecessary Select tools) lended itself quite nicely to a goalpost layout. Sooooo... I went for the 2-point conversion. :)

WeeklyChallenge103.JPG

Cheers!

 

NJ