cancel
Showing results for
Did you mean:

# Weekly Challenge

Solve the challenge, share your solution and summit the ranks of our Community!
New content is available in Academy! You may need to clear your browser cache for an optimal viewing experience

## Challenge #157: An Expert Challenge

Highlighted
Mgr, Global Certification

Last week's solution can be found here

Have you ever wanted to take a sneak peek at one of the questions on the Alteryx Designer Expert exam? We thought you might, so for the first time ever we’re releasing one of our retired Expert exam questions as a Weekly Challenge! The amazing @CristonS  created this question and it made its debut at the first Expert exam in Anaheim last summer. This question gave everyone a hard time, and most people avoided it all together, so if it seems intimidating you’re not alone! We wanted to keep this in the same format as the actual exam question so you won’t see an output file, just an input. We’ll post the answer and our solution next week.

You are provided a dataset (Q2_variables.yxdb) that contains multiple variables. Select the ten (10) numeric variables with the highest Mean Decrease Gini coefficient from the variable importance plot. Use these variables to build a model to predict the target variable, [H0]. Compare two models: one based on all of the selected variables, and another that includes the selected variables except [F_38]. What is the effect of removing this variable [F_38] from the model? Provide the Chi-Sq effect as your answer.

Aurora

I would have been skipping this problem as well on the exam! Here is my best guess

Spoiler
I found via this unanswered post that perhaps the forest model would give us the variance importance plat with the mean decrease of the Gini. Thus I took the top 10 from this output of the forest model:

To get the Chi squared, I just used a simple formula and copied the data from the confusion matrix. Here is the 10 variable confusion matrix:

Asteroid

Well, this one was way over my head.  :-)   but took my best shot at it..

Spoiler

Alteryx Certified Partner

Really a hard one - the main problem was to find that "Variable Importance Plot Mean Decrease in Gini", but Alteryx Community has been very helpful on that.

Spoiler
The post Help Mean Decrease in Gini for Dummies was the first approach.

I decided to use Logistic Regression because field H_0 seems to be binary.

The Nested Test tool does exactly what is needed - it compare two models with one using only a subset of the variables.

Nebula

Also in way over my head.  Google was definitely my friend today!

Spoiler
Like @RolandSchubert I first searched for "Mean Decrease in Gini" which brought up the same community article,
resulting in the

Using the top 10 variables in a logistic regression and the same minus F_38 in another gave me the outputs. Then another search for "alteryx compare two models with subset of predictor variables" which brought me to the Nested Test help page.

Which gave me the Chi-Sq score

Given how much I didn't know about this entire domain, I don't think I'll be writing my expert exam any time soon.

Dan

Quasar

Expert is tough! So much respect for anyone who passed!

Here is my best guess for now..

Spoiler

I did a forest (in the container) to get the top 10 variables. I then did two logistic regressions - one with and one without F38 - used this as the target is categorical. Then scored my model, made the result categorical again and let the categorical tool calculate the Chi.

This is about where I got with googling and guessing - maybe time to understand what everything means?
Bolide

I am with most and would have skipped this one. This was a fun challenge though that I did two different ways.

Spoiler
I started by using the forest model tool to find the information needed. Definitely had to do some research on the tool mastery pages for this: Tool Mastery Index

Once I had found my variables I put them into decision tress again, just with the 10 variables: S_3, S_6, S_11, S_12, F_15, F_17, F_21, F_23, F_29, and F_38.
I then scored the models and calculated the chi-squared statistic for each model. I finally compared the two.

In the second method I used logistic regression as the target variable is binary. Using Logistic Regression tools, I then compared the models with the nested test tool.

Interestingly enough, both methods produce very different answers, so I end my quest with confusion, but two really good guesses.
Asteroid

Hi! Here my solution :)

Spoiler
Asteroid

Here is my solution... hopefully I've found the correct CHI calculator...

Guess I need to understand more on what predictive really does :)

Spoiler
Meteor

Here is my solution.

Spoiler