Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #18: Predicting Baseball Wins

danicahui
8 - Asteroid
Spoiler
Challenge 18 2019-11-25.jpg
Chantelb
9 - Comet
Spoiler
Chantelbbr_0-1574727133545.png
mbogusz
9 - Comet
Spoiler
2019-12-04 07_26_08-Greenshot.png
Stevej028
7 - Meteor
Spoiler
image.png
SueDonim
8 - Asteroid

Just beginning to learn predictive tools, so this one took a bit of time.  Fortunately, we didn't need to know what all those fields actually mean

 

Spoiler
Process:
- Association Analysis tool to determine correlation of variables to Wins.  I copied the output to a spreadsheet and sorted to determine the top 10
- Regress those 10 variables against Wins
- Join initial stats to list of 6 teams of interest
- Input the data for the 6 teams into the regression model from above using the Score tool
- Round Projected Wins (the output variable)
- Projected Losses = 162 - Projected Wins
- Sort by Projected Wins (desc) then Projected Losses (asc) then Team
- Calculate Games Back temp as the difference between Projected Wins vs Projected Wins for the prior record
- Calculate Games Back as the running total of Games Back temp
- Clean-up

MySolution.PNG
justindavis
10 - Fireball

This was definitely helpful, as I've struggled a few times getting Alteryx to successfully run and score the model (I had tried this one a couple times awhile back). Realized that...

 

Spoiler
challenge_18_solution_justindavis.PNG
... by filtering down to the teams before generating the linear model, there were not enough df for the model to be generated. Filtering had to be done after the model creation. Learned on this one for sure!
rmassambane
10 - Fireball
 
AlexC2
8 - Asteroid

solutions as attached

Hjardine
8 - Asteroid
Spoiler
Hjardine_0-1577995160725.png

correlation analysis tool to find top 10 variables, use those in the linear regression tool, score tool to test it on the filtered data for the teams instructed, subtract from 162 to get losses, summarize to find max, append, subtract wins from max to find games back

michalklofac
7 - Meteor

My approach is a little different in that I "automated" the selection of the top 10 predictor variables. First I identify the 10 with the highest correlation (Sort by the correlation in descending order, Sample first 10 records) and then Join this list to the original data set which has been Transposed (so that each team and variable is a single row). This allows me to simply "select all" in the Linear Regression tool (and just deselect the two irrelevant variables).

 

Spoiler
Screenshot (62).png