Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #18: Predicting Baseball Wins

hanykowska
11 - Bolide

I get slightly different results but they're close enough, my workflow is almost identical to the solution

 

Spoiler
image.png
Johnny_Analytics
8 - Asteroid
Spoiler
Challenge_18_solution.png
klyuka
8 - Asteroid

Alteryx need a function to select top(10) fields in linear regression automatically as .yxfd file.

DavidW
Alteryx Alumni (Retired)
Spoiler
Screenshot.png
David Wilcox
Senior Software Engineer
Alteryx
Jonathan-Sherman
15 - Aurora
15 - Aurora

Challenge 18 is done!

 

Spoiler
challenge 18 JMS solution.PNG
lavanyadurai
7 - Meteor
Spoiler
Challenge18.PNG

 

Reesetrain2
9 - Comet

All,

 

I guess that I will go deep on this one!

-In my workflow I have my solution that has My personal selected variables as well as the variables of the solution that I vehemently oppose.

-The reason is although I am able to get the same output, we are really taking Predictive Modeling out of context when we don't examine variable selection.

-I would rather see the Stepwise tool be used next time vice using a simple correlation.

Spoiler
Common VariablesOBP 
RBI
TB
My VariablesBatAgeSolution VariablesBA
CSHR
GDPOPS
IBBOPS_Adj
SBR
X2BR_G
X_BatSLG
    
    
    
VariableGVIFDFStd_GVIF
OPS10231.975291101.1532
SLG5140.467053171.69705
R_G2390.894591148.89677
R2206.742874146.97598
OBP1373.834561137.06527
RBI261.1853983116.16123
TB60.3874278717.770935
HR31.4576173115.608709
BA9.18471297713.030629
OPS_Adj6.80281245912.60822



As we can see, there are three variables that carry over: On-base-percentage (OBP), Total Bases, and Run-Batted-In (RBI). These three are pretty universal when doing this Moneyball style challenge. Teams that can get on base, bat runs in and advance on bases generally score more runs over time which leads to more victories.

-Looking at the Variation-Inflation-Factors for the solution model, we see that VIF is well above 6, for all variables except Batting Average and adjusted On-base Percentage Adjusted.

-Long story short is that these variables, while helpful, have already been accounted for by OBP,RBI, and TB. My model utilizes more of the "Negative" Baseball stats like Caught Stealing, Times Grounded into Double Play where you could say that ever instance costs your team a one-seventh of a game exclusive of everything else. Batters age is surprising as every year gets you about 2.75 games, as I see this as a proxy for experienced players. A young team just doesn't have the experience and will be more likely to ground-out into double plays and not get on-base as more experienced players (who are also more likely to be plucked from lesser teams).
ANOVA.PNG
Cheers!



Matt
MikeHinz
8 - Asteroid

I defiantly need more work on the predictive tools. This exercise was a good challenge. 

Spoiler
challange 18.PNG
ZenonH
8 - Asteroid

Laziness breeds efficiency. I started with the Association Analysis tool, but didn't want to sift through the data to figure out the top 10. I looked through the other Data Investigation tools and found the Pearson Correlation tool, which let me find the top 10 much easier (And more reliable than looking through a big table (:   )

Spoiler
ZH_WF.PNG
JonathanAllenby
8 - Asteroid

Here's my solution to challenge 18.

 

Spoiler
CH18_Solution.PNG