Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #157: An Expert Challenge

ElizabethB
Alteryx Alumni (Retired)

Last week's solution can be found here

 

Have you ever wanted to take a sneak peek at one of the questions on the Alteryx Designer Expert exam? We thought you might, so for the first time ever we’re releasing one of our retired Expert exam questions as a Weekly Challenge! The amazing @CristonS  created this question and it made its debut at the first Expert exam in Anaheim last summer. This question gave everyone a hard time, and most people avoided it all together, so if it seems intimidating you’re not alone! We wanted to keep this in the same format as the actual exam question so you won’t see an output file, just an input. We’ll post the answer and our solution next week.

 

You are provided a dataset (Q2_variables.yxdb) that contains multiple variables. Select the ten (10) numeric variables with the highest Mean Decrease Gini coefficient from the variable importance plot. Use these variables to build a model to predict the target variable, [H0]. Compare two models: one based on all of the selected variables, and another that includes the selected variables except [F_38]. What is the effect of removing this variable [F_38] from the model? Provide the Chi-Sq effect as your answer.

patrick_digan
17 - Castor
17 - Castor

I would have been skipping this problem as well on the exam! Here is my best guess

Spoiler
I found via this unanswered post that perhaps the forest model would give us the variance importance plat with the mean decrease of the Gini. Thus I took the top 10 from this output of the forest model:
Capture.PNG
To get the Chi squared, I just used a simple formula and copied the data from the confusion matrix. Here is the 10 variable confusion matrix:
Capture2.PNG
Capture3.PNG
pjdit
8 - Asteroid

Well, this one was way over my head.  :-)   but took my best shot at it..

Spoiler
Challenge_157_Spoiler_1.JPGChallenge_157_Spoiler_2.JPGChallenge_157_Spoiler_3.JPG

 

RolandSchubert
16 - Nebula
16 - Nebula

Really a hard one - the main problem was to find that "Variable Importance Plot Mean Decrease in Gini", but Alteryx Community has been very helpful on that.

 

Spoiler
The post Help Mean Decrease in Gini for Dummies was the first approach.

I decided to use Logistic Regression because field H_0 seems to be binary.

26-03-_2019_08-59-54.png
The Nested Test tool does exactly what is needed - it compare two models with one using only a subset of the variables.

 

danilang
19 - Altair
19 - Altair

Also in way over my head.  Google was definitely my friend today!

 

 

Spoiler
Like @RolandSchubert I first searched for "Mean Decrease in Gini" which brought up the same community article,   
resulting in the 

GINI.png

Using the top 10 variables in a logistic regression and the same minus F_38 in another gave me the outputs. Then another search for "alteryx compare two models with subset of predictor variables" which brought me to the Nested Test help page.  

WF.png

Which gave me the Chi-Sq score 

Chi.png

Given how much I didn't know about this entire domain, I don't think I'll be writing my expert exam any time soon.

 

Dan

 

kat
12 - Quasar

Expert is tough! So much respect for anyone who passed!

 

Here is my best guess for now..

 

Spoiler
Challenge #157.PNG

I did a forest (in the container) to get the top 10 variables. I then did two logistic regressions - one with and one without F38 - used this as the target is categorical. Then scored my model, made the result categorical again and let the categorical tool calculate the Chi.

This is about where I got with googling and guessing - maybe time to understand what everything means?
cplewis90
13 - Pulsar
13 - Pulsar

I am with most and would have skipped this one. This was a fun challenge though that I did two different ways. 

Spoiler
I started by using the forest model tool to find the information needed. Definitely had to do some research on the tool mastery pages for this: Tool Mastery Index
Challenge 157-1.PNG
Challenge 157-5.PNG
Once I had found my variables I put them into decision tress again, just with the 10 variables: S_3, S_6, S_11, S_12, F_15, F_17, F_21, F_23, F_29, and F_38.
I then scored the models and calculated the chi-squared statistic for each model. I finally compared the two.
Challenge 157-3.PNG
In the second method I used logistic regression as the target variable is binary. Using Logistic Regression tools, I then compared the models with the nested test tool. 
Challenge 157-4.PNG
Interestingly enough, both methods produce very different answers, so I end my quest with confusion, but two really good guesses.  
Challenge 157-6.PNGChallenge 157-7.PNG
RichoBsJ
11 - Bolide

Hi! Here my solution :)

 

Spoiler
Effect removing F38.PNGVariable Importance Plot.PNGworkflow157.PNG
pasccout
8 - Asteroid

Here is my solution... hopefully I've found the correct CHI calculator... 

Guess I need to understand more on what predictive really does :)

 

Spoiler
Chi Result.jpg
edwin_isensee
7 - Meteor

Here is my solution.

 

Spoiler
Flow.pngPlot.png
Result.png