Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語

Want to get involved? We're always looking for ideas and content for Weekly Challenges.


Challenge #157: An Expert Challenge

Alteryx Alumni (Retired)

Last week's solution can be found here


Have you ever wanted to take a sneak peek at one of the questions on the Alteryx Designer Expert exam? We thought you might, so for the first time ever we’re releasing one of our retired Expert exam questions as a Weekly Challenge! The amazing @CristonS  created this question and it made its debut at the first Expert exam in Anaheim last summer. This question gave everyone a hard time, and most people avoided it all together, so if it seems intimidating you’re not alone! We wanted to keep this in the same format as the actual exam question so you won’t see an output file, just an input. We’ll post the answer and our solution next week.


You are provided a dataset (Q2_variables.yxdb) that contains multiple variables. Select the ten (10) numeric variables with the highest Mean Decrease Gini coefficient from the variable importance plot. Use these variables to build a model to predict the target variable, [H0]. Compare two models: one based on all of the selected variables, and another that includes the selected variables except [F_38]. What is the effect of removing this variable [F_38] from the model? Provide the Chi-Sq effect as your answer.

17 - Castor
17 - Castor

I would have been skipping this problem as well on the exam! Here is my best guess

I found via this unanswered post that perhaps the forest model would give us the variance importance plat with the mean decrease of the Gini. Thus I took the top 10 from this output of the forest model:
To get the Chi squared, I just used a simple formula and copied the data from the confusion matrix. Here is the 10 variable confusion matrix:
8 - Asteroid

Well, this one was way over my head.  :-)   but took my best shot at it..



16 - Nebula
16 - Nebula

Really a hard one - the main problem was to find that "Variable Importance Plot Mean Decrease in Gini", but Alteryx Community has been very helpful on that.


The post Help Mean Decrease in Gini for Dummies was the first approach.

I decided to use Logistic Regression because field H_0 seems to be binary.

The Nested Test tool does exactly what is needed - it compare two models with one using only a subset of the variables.


19 - Altair
19 - Altair

Also in way over my head.  Google was definitely my friend today!



Like @RolandSchubert I first searched for "Mean Decrease in Gini" which brought up the same community article,   
resulting in the 


Using the top 10 variables in a logistic regression and the same minus F_38 in another gave me the outputs. Then another search for "alteryx compare two models with subset of predictor variables" which brought me to the Nested Test help page.  


Which gave me the Chi-Sq score 


Given how much I didn't know about this entire domain, I don't think I'll be writing my expert exam any time soon.




12 - Quasar

Expert is tough! So much respect for anyone who passed!


Here is my best guess for now..


Challenge #157.PNG

I did a forest (in the container) to get the top 10 variables. I then did two logistic regressions - one with and one without F38 - used this as the target is categorical. Then scored my model, made the result categorical again and let the categorical tool calculate the Chi.

This is about where I got with googling and guessing - maybe time to understand what everything means?
13 - Pulsar
13 - Pulsar

I am with most and would have skipped this one. This was a fun challenge though that I did two different ways. 

I started by using the forest model tool to find the information needed. Definitely had to do some research on the tool mastery pages for this: Tool Mastery Index
Challenge 157-1.PNG
Challenge 157-5.PNG
Once I had found my variables I put them into decision tress again, just with the 10 variables: S_3, S_6, S_11, S_12, F_15, F_17, F_21, F_23, F_29, and F_38.
I then scored the models and calculated the chi-squared statistic for each model. I finally compared the two.
Challenge 157-3.PNG
In the second method I used logistic regression as the target variable is binary. Using Logistic Regression tools, I then compared the models with the nested test tool. 
Challenge 157-4.PNG
Interestingly enough, both methods produce very different answers, so I end my quest with confusion, but two really good guesses.  
Challenge 157-6.PNGChallenge 157-7.PNG
11 - Bolide

Hi! Here my solution :)


Effect removing F38.PNGVariable Importance Plot.PNGworkflow157.PNG
8 - Asteroid

Here is my solution... hopefully I've found the correct CHI calculator... 

Guess I need to understand more on what predictive really does :)


Chi Result.jpg
7 - Meteor

Here is my solution.