Challenge #157: An Expert Challenge
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Last week's solution can be found here.
Have you ever wanted to take a sneak peek at one of the questions on the Alteryx Designer Expert exam? We thought you might, so for the first time ever we’re releasing one of our retired Expert exam questions as a Weekly Challenge! The amazing @CristonS created this question and it made its debut at the first Expert exam in Anaheim last summer. This question gave everyone a hard time, and most people avoided it all together, so if it seems intimidating you’re not alone! We wanted to keep this in the same format as the actual exam question so you won’t see an output file, just an input. We’ll post the answer and our solution next week.
You are provided a dataset (Q2_variables.yxdb) that contains multiple variables. Select the ten (10) numeric variables with the highest Mean Decrease Gini coefficient from the variable importance plot. Use these variables to build a model to predict the target variable, [H0]. Compare two models: one based on all of the selected variables, and another that includes the selected variables except [F_38]. What is the effect of removing this variable [F_38] from the model? Provide the Chi-Sq effect as your answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I would have been skipping this problem as well on the exam! Here is my best guess
To get the Chi squared, I just used a simple formula and copied the data from the confusion matrix. Here is the 10 variable confusion matrix:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Really a hard one - the main problem was to find that "Variable Importance Plot Mean Decrease in Gini", but Alteryx Community has been very helpful on that.
I decided to use Logistic Regression because field H_0 seems to be binary.
The Nested Test tool does exactly what is needed - it compare two models with one using only a subset of the variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Also in way over my head. Google was definitely my friend today!
resulting in the
Using the top 10 variables in a logistic regression and the same minus F_38 in another gave me the outputs. Then another search for "alteryx compare two models with subset of predictor variables" which brought me to the Nested Test help page.
Which gave me the Chi-Sq score
Given how much I didn't know about this entire domain, I don't think I'll be writing my expert exam any time soon.
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Expert is tough! So much respect for anyone who passed!
Here is my best guess for now..
I did a forest (in the container) to get the top 10 variables. I then did two logistic regressions - one with and one without F38 - used this as the target is categorical. Then scored my model, made the result categorical again and let the categorical tool calculate the Chi.
This is about where I got with googling and guessing - maybe time to understand what everything means?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am with most and would have skipped this one. This was a fun challenge though that I did two different ways.
Once I had found my variables I put them into decision tress again, just with the 10 variables: S_3, S_6, S_11, S_12, F_15, F_17, F_21, F_23, F_29, and F_38.
I then scored the models and calculated the chi-squared statistic for each model. I finally compared the two.
In the second method I used logistic regression as the target variable is binary. Using Logistic Regression tools, I then compared the models with the nested test tool.
Interestingly enough, both methods produce very different answers, so I end my quest with confusion, but two really good guesses.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Here is my solution... hopefully I've found the correct CHI calculator...
Guess I need to understand more on what predictive really does :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator