community
cancel
Showing results for 
Search instead for 
Did you mean: 
Do you have the skills to make it to the top? Subscribe to our weekly challenges. Try your best to solve the problem, share your solution, and see how others tackled the same problem. We share our answer too.
Weekly Challenge
Do you have the skills to make it to the top? Subscribe to our weekly challenges. Try your best to solve the problem, share your solution, and see how others tackled the same problem. We share our answer too.
Unable to display your progress at this time. Please try again a little later, or contact an administrator if you continue to see this error.
Getting started with Designer? | Start your journey with our new Learning Path!

Challenge #18: Predicting Baseball Wins

Asteroid

I get slightly different results but they're close enough, my workflow is almost identical to the solution

 

Spoiler
image.png
Spoiler
Challenge_18_solution.png
Asteroid

Alteryx need a function to select top(10) fields in linear regression automatically as .yxfd file.

Alteryx
Alteryx
Spoiler
Screenshot.png
David Wilcox
Senior Software Engineer
Alteryx
Alteryx Partner

Challenge 18 is done!

 

Spoiler
challenge 18 JMS solution.PNG
Highlighted
Spoiler
Challenge18.PNG

 

Asteroid

All,

 

I guess that I will go deep on this one!

-In my workflow I have my solution that has My personal selected variables as well as the variables of the solution that I vehemently oppose.

-The reason is although I am able to get the same output, we are really taking Predictive Modeling out of context when we don't examine variable selection.

-I would rather see the Stepwise tool be used next time vice using a simple correlation.

Spoiler
Common VariablesOBP 
RBI
TB
My VariablesBatAgeSolution VariablesBA
CSHR
GDPOPS
IBBOPS_Adj
SBR
X2BR_G
X_BatSLG
    
    
    
VariableGVIFDFStd_GVIF
OPS10231.975291101.1532
SLG5140.467053171.69705
R_G2390.894591148.89677
R2206.742874146.97598
OBP1373.834561137.06527
RBI261.1853983116.16123
TB60.3874278717.770935
HR31.4576173115.608709
BA9.18471297713.030629
OPS_Adj6.80281245912.60822



As we can see, there are three variables that carry over: On-base-percentage (OBP), Total Bases, and Run-Batted-In (RBI). These three are pretty universal when doing this Moneyball style challenge. Teams that can get on base, bat runs in and advance on bases generally score more runs over time which leads to more victories.

-Looking at the Variation-Inflation-Factors for the solution model, we see that VIF is well above 6, for all variables except Batting Average and adjusted On-base Percentage Adjusted.

-Long story short is that these variables, while helpful, have already been accounted for by OBP,RBI, and TB. My model utilizes more of the "Negative" Baseball stats like Caught Stealing, Times Grounded into Double Play where you could say that ever instance costs your team a one-seventh of a game exclusive of everything else. Batters age is surprising as every year gets you about 2.75 games, as I see this as a proxy for experienced players. A young team just doesn't have the experience and will be more likely to ground-out into double plays and not get on-base as more experienced players (who are also more likely to be plucked from lesser teams).
ANOVA.PNG
Cheers!



Matt
Asteroid

I defiantly need more work on the predictive tools. This exercise was a good challenge. 

Spoiler
challange 18.PNG
Asteroid

Laziness breeds efficiency. I started with the Association Analysis tool, but didn't want to sift through the data to figure out the top 10. I looked through the other Data Investigation tools and found the Pearson Correlation tool, which let me find the top 10 much easier (And more reliable than looking through a big table (:   )

Spoiler
ZH_WF.PNG
Alteryx Partner

Here's my solution to challenge 18.

 

Spoiler
CH18_Solution.PNG