Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #103: Just another game?

ipeng
8 - Asteroid

Used the Association tool to check the correlation. Most of the variances have p-value not low enough to be significant. Below is what I got...

Challenge 103 SuperBowl Prediction.PNG

How to decide if Super Bowl is (or isn't) a game? Predictions for Super Bowl are just similar as those for week 13

JoshuaGostick
11 - Bolide

My solution :)

Spoiler
challenge_103.PNG
RichoBsJ
11 - Bolide

Hi! Here my challenge :)

 

Spoiler
challenge_103.PNG
cplewis90
13 - Pulsar
13 - Pulsar

I didn't quite meet the predictions of your model, but I came to a similar end result. It is not quite "just another game" in comparison to the week I chose. Anyway...Go Buccaneers!!!! 

buccaneers.png

Spoiler
Challenge 103.PNG
Kenda
16 - Nebula
16 - Nebula
Spoiler
Capture.PNG
RolandSchubert
16 - Nebula
16 - Nebula

My solution. I decided to use only three variables considering the p-values.

Spoiler
21-06-_2019_23-09-36.png
TimothyManning
8 - Asteroid
Spoiler
103. Predictive.PNG103. Predictive 2.PNG103. Predictive 3.PNG


Could have parsed it in a cleaner way but it did the job! Then I learned about the association analysis tool from @Natasha's workflow and how to then use those chosen variables for regression analysis. Predicted scores were done using only one variable in the end, just to see how close you could get with one variable (Offense - PassY)
ZenonH
8 - Asteroid

Looks like defensive stats might be better at predicting Super Bowl scores than regular season scores. Hopefully this reinforces the idea that a strong defense is a strong offense, and shows how that giving an inch can indeed result in losing a mile. 

Spoiler
ZH WF.PNG
kelly_gilbert
13 - Pulsar

I've been sitting on this one for a looong time, because I got sidetracked trying to figure out how to use the R tool to generate residual plots (we were on an older version at the time, that didn't appear to include them). I finally figured out how to do it, even though they're part of the standard linear regression tool output now! I took the Intro to Advanced Analytics training at Inspire 2019, and we asked about residual plots. The instructor advised us to calculate the residuals and create the plots ourselves, so I'm guessing this is a pretty recent addition.

 

Workflow:

Spoiler
Parse and prep data:

challenge_103_01_parse and prep.png


Hold out the super bowl weeks, plus one random week from each year:
(I need to find a better way to randomly sample within groups - this works, but isn't reproducible)
challenge_103_02_ sample games.PNG


Data investigation:
challenge_103_03_investigation.PNG


Build the model and score the sample weeks:
(I used stepwise regression to select the model)

In practice, I'd do a little more investigation, as I'm not satisfied with this model. I'd also want to educate myself more on the "business context," since I know very little about football! 
challenge_103_04_prediction.PNG

Results:

Spoiler
Selected model:
challenge_103_04b_model_output.PNG


Predictions:
challenge_103_05_output.PNG 

 

RWvanLeeuwen
11 - Bolide

 

Spoiler
with many correlated features I should have considered extracting principal components, but I can interpret that because I don't know the sport. I manually removed fields from the equation (which doesn't even bother with interactions) starting with the field with the highest P value (thus lowest absolute t-value). I reran the workflow until I noticed that I had significance values of all below .1 so I stuck with that.with many correlated features I should have considered extracting principal components, but I can interpret that because I don't know the sport. I manually removed fields from the equation (which doesn't even bother with interactions) starting with the field with the highest P value (thus lowest absolute t-value). I reran the workflow until I noticed that I had significance values of all below .1 so I stuck with that.