cancel
Showing results for
Did you mean:
Do you have the skills to make it to the top? Subscribe to our weekly challenges. Try your best to solve the problem, share your solution, and see how others tackled the same problem. We share our answer too.
Weekly Challenge
Do you have the skills to make it to the top? Subscribe to our weekly challenges. Try your best to solve the problem, share your solution, and see how others tackled the same problem. We share our answer too.
Unable to display your progress at this time. Please try again a little later, or contact an administrator if you continue to see this error.
Getting started with Designer? | Start your journey with our new Learning Path!

## Challenge #18: Predicting Baseball Wins

Asteroid

I get slightly different results but they're close enough, my workflow is almost identical to the solution

Spoiler
Asteroid
Spoiler
Asteroid

Alteryx need a function to select top(10) fields in linear regression automatically as .yxfd file.

Alteryx
Spoiler
David Wilcox
Senior Software Engineer
Alteryx
Alteryx Partner

Challenge 18 is done!

Spoiler
Meteor
Spoiler

Asteroid

All,

I guess that I will go deep on this one!

-In my workflow I have my solution that has My personal selected variables as well as the variables of the solution that I vehemently oppose.

-The reason is although I am able to get the same output, we are really taking Predictive Modeling out of context when we don't examine variable selection.

-I would rather see the Stepwise tool be used next time vice using a simple correlation.

Spoiler
 Common Variables OBP RBI TB My Variables BatAge Solution Variables BA CS HR GDP OPS IBB OPS_Adj SB R X2B R_G X_Bat SLG Variable GVIF DF Std_GVIF OPS 10231.97529 1 101.1532 SLG 5140.467053 1 71.69705 R_G 2390.894591 1 48.89677 R 2206.742874 1 46.97598 OBP 1373.834561 1 37.06527 RBI 261.1853983 1 16.16123 TB 60.38742787 1 7.770935 HR 31.45761731 1 5.608709 BA 9.184712977 1 3.030629 OPS_Adj 6.802812459 1 2.60822

As we can see, there are three variables that carry over: On-base-percentage (OBP), Total Bases, and Run-Batted-In (RBI). These three are pretty universal when doing this Moneyball style challenge. Teams that can get on base, bat runs in and advance on bases generally score more runs over time which leads to more victories.

-Looking at the Variation-Inflation-Factors for the solution model, we see that VIF is well above 6, for all variables except Batting Average and adjusted On-base Percentage Adjusted.

-Long story short is that these variables, while helpful, have already been accounted for by OBP,RBI, and TB. My model utilizes more of the "Negative" Baseball stats like Caught Stealing, Times Grounded into Double Play where you could say that ever instance costs your team a one-seventh of a game exclusive of everything else. Batters age is surprising as every year gets you about 2.75 games, as I see this as a proxy for experienced players. A young team just doesn't have the experience and will be more likely to ground-out into double plays and not get on-base as more experienced players (who are also more likely to be plucked from lesser teams).

Cheers!

Matt
Asteroid

I defiantly need more work on the predictive tools. This exercise was a good challenge.

Spoiler
Asteroid

Laziness breeds efficiency. I started with the Association Analysis tool, but didn't want to sift through the data to figure out the top 10. I looked through the other Data Investigation tools and found the Pearson Correlation tool, which let me find the top 10 much easier (And more reliable than looking through a big table (:   )

Spoiler
Highlighted
Alteryx Partner

Here's my solution to challenge 18.

Spoiler