community
cancel
Showing results for 
Search instead for 
Did you mean: 

Weekly Challenge

Solve the challenge, share your solution and summit the ranks of our Community!
New content is available in Academy! You may need to clear your browser cache for an optimal viewing experience

Challenge #18: Predicting Baseball Wins

Asteroid
Spoiler
Challenge 18 2019-11-25.jpg
Asteroid
Spoiler
Chantelbbr_0-1574727133545.png
Asteroid
Spoiler
2019-12-04 07_26_08-Greenshot.png
Meteor
Spoiler
image.png
Alteryx Partner

Just beginning to learn predictive tools, so this one took a bit of time.  Fortunately, we didn't need to know what all those fields actually mean

 

Spoiler
Process:
- Association Analysis tool to determine correlation of variables to Wins.  I copied the output to a spreadsheet and sorted to determine the top 10
- Regress those 10 variables against Wins
- Join initial stats to list of 6 teams of interest
- Input the data for the 6 teams into the regression model from above using the Score tool
- Round Projected Wins (the output variable)
- Projected Losses = 162 - Projected Wins
- Sort by Projected Wins (desc) then Projected Losses (asc) then Team
- Calculate Games Back temp as the difference between Projected Wins vs Projected Wins for the prior record
- Calculate Games Back as the running total of Games Back temp
- Clean-up

MySolution.PNG
Asteroid

This was definitely helpful, as I've struggled a few times getting Alteryx to successfully run and score the model (I had tried this one a couple times awhile back). Realized that...

 

Spoiler
challenge_18_solution_justindavis.PNG
... by filtering down to the teams before generating the linear model, there were not enough df for the model to be generated. Filtering had to be done after the model creation. Learned on this one for sure!
Asteroid
 
Meteor

solutions as attached

Asteroid
Spoiler
Hjardine_0-1577995160725.png

correlation analysis tool to find top 10 variables, use those in the linear regression tool, score tool to test it on the filtered data for the teams instructed, subtract from 162 to get losses, summarize to find max, append, subtract wins from max to find games back

Meteoroid

My approach is a little different in that I "automated" the selection of the top 10 predictor variables. First I identify the 10 with the highest correlation (Sort by the correlation in descending order, Sample first 10 records) and then Join this list to the original data set which has been Transposed (so that each team and variable is a single row). This allows me to simply "select all" in the Linear Regression tool (and just deselect the two irrelevant variables).

 

Spoiler
Screenshot (62).png