We've recently made an accessibility improvement to the community and therefore posts without any content are no longer allowed. Please use the spoiler feature or add a short message in the message body in order to submit your weekly challenge.
alteryx Community

# Weekly Challenge

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
###### IDEAS WANTED

We're actively looking for ideas on how to improve Weekly Challenges and would love to hear what you think!

Submit Feedback

## Challenge #157: An Expert Challenge

Alteryx Alumni (Retired)

Last week's solution can be found here

Have you ever wanted to take a sneak peek at one of the questions on the Alteryx Designer Expert exam? We thought you might, so for the first time ever we’re releasing one of our retired Expert exam questions as a Weekly Challenge! The amazing @CristonS  created this question and it made its debut at the first Expert exam in Anaheim last summer. This question gave everyone a hard time, and most people avoided it all together, so if it seems intimidating you’re not alone! We wanted to keep this in the same format as the actual exam question so you won’t see an output file, just an input. We’ll post the answer and our solution next week.

You are provided a dataset (Q2_variables.yxdb) that contains multiple variables. Select the ten (10) numeric variables with the highest Mean Decrease Gini coefficient from the variable importance plot. Use these variables to build a model to predict the target variable, [H0]. Compare two models: one based on all of the selected variables, and another that includes the selected variables except [F_38]. What is the effect of removing this variable [F_38] from the model? Provide the Chi-Sq effect as your answer.

17 - Castor

I would have been skipping this problem as well on the exam! Here is my best guess

Spoiler
I found via this unanswered post that perhaps the forest model would give us the variance importance plat with the mean decrease of the Gini. Thus I took the top 10 from this output of the forest model:

To get the Chi squared, I just used a simple formula and copied the data from the confusion matrix. Here is the 10 variable confusion matrix:

8 - Asteroid

Well, this one was way over my head.  :-)   but took my best shot at it..

Spoiler

16 - Nebula

Really a hard one - the main problem was to find that "Variable Importance Plot Mean Decrease in Gini", but Alteryx Community has been very helpful on that.

Spoiler
The post Help Mean Decrease in Gini for Dummies was the first approach.

I decided to use Logistic Regression because field H_0 seems to be binary.

The Nested Test tool does exactly what is needed - it compare two models with one using only a subset of the variables.

18 - Pollux

Also in way over my head.  Google was definitely my friend today!

Spoiler
Like @RolandSchubert I first searched for "Mean Decrease in Gini" which brought up the same community article,
resulting in the

Using the top 10 variables in a logistic regression and the same minus F_38 in another gave me the outputs. Then another search for "alteryx compare two models with subset of predictor variables" which brought me to the Nested Test help page.

Which gave me the Chi-Sq score

Given how much I didn't know about this entire domain, I don't think I'll be writing my expert exam any time soon.

Dan

12 - Quasar

Expert is tough! So much respect for anyone who passed!

Here is my best guess for now..

Spoiler

I did a forest (in the container) to get the top 10 variables. I then did two logistic regressions - one with and one without F38 - used this as the target is categorical. Then scored my model, made the result categorical again and let the categorical tool calculate the Chi.

This is about where I got with googling and guessing - maybe time to understand what everything means?
13 - Pulsar

I am with most and would have skipped this one. This was a fun challenge though that I did two different ways.

Spoiler
I started by using the forest model tool to find the information needed. Definitely had to do some research on the tool mastery pages for this: Tool Mastery Index

Once I had found my variables I put them into decision tress again, just with the 10 variables: S_3, S_6, S_11, S_12, F_15, F_17, F_21, F_23, F_29, and F_38.
I then scored the models and calculated the chi-squared statistic for each model. I finally compared the two.

In the second method I used logistic regression as the target variable is binary. Using Logistic Regression tools, I then compared the models with the nested test tool.

Interestingly enough, both methods produce very different answers, so I end my quest with confusion, but two really good guesses.
11 - Bolide

Hi! Here my solution :)

Spoiler
8 - Asteroid

Here is my solution... hopefully I've found the correct CHI calculator...

Guess I need to understand more on what predictive really does :)

Spoiler
7 - Meteor

Here is my solution.

Spoiler