# Weekly Challenge

## Challenge #157: An Expert Challenge

Quasar

Wow.  So not ready for the Expert Exam...yet.  I would have spent my entire exam time researching the tools for this.  The difficult part is knowing which predictive tools to use.  Went down a rabbit hole to combine results and calculate Chi-Squared until I revisited the tool documentation for predictive tools and found the Nested Test tool.  I wish I had Alteryx with these tools in college (soooo long ago).

Asteroid

Pardon me if I'm confused, but what does a question like this have to do with a designer certification? Find the answer to that, and you'll know why most people avoided it.

Magnetar

Cool challenge! I have no idea if my solution is anywhere near correct, but enjoyed getting there nonetheless!

The first approach would normally be to use a Field Summary tool to explore the data. But, since we already know what we want to target, a Summarize tool on H0 is much faster. With just 2 answers, this is probably binary, so I'll model with a Logistic Regression.

Determining the Mean Decrease Gini Coefficient means a Forest model (see the same article that many of us found here)
Variables outlined in the red box below

2 Logistic regressions, one with F_38 and one without. And a Nested Test to compare the two models. Chi-squared is in the nested test

Bolide

Whew, had to spend some quality time with the tool documentation on this one. Statistically, this was out of my depth, however, I was totally surprised to find that I was actually close to the right path.

I had no idea how (or why) to use the model outputs in a chisq test. TIL about the Nested Test Tool!

Quasar

Nice challenge, way too hard without the community's help :)

Bolide

Facing my old nemesis from the expert exam!  I am one of the people who attempted this one.  LOL @CristonS  you won't believe what I did...

So...I knew immediately how to get to the variables with the highest Mean Decrease Gini by using a Random Forest Model.  However when it said compare two models and provide the difference in Chi-Sq I assumed this was also using Random Forest Model for just the 10 variables then removing F38.  Saw no obvious Chi-Sq difference in the output so I googled and believe I found a way to calculate it manually from the confusion matrix.  I would have never figured out that I had to use Logistic Regression and had to peek at the solution to understand it....

Magnetar

Community resources to the rescue!!!

Part of me is extremely pleased when the solution to a question has so few tools... the other part of me, which does not entirely understand what I did to get to this answer, is mildly terrified of the fact that this answer required so few tools. :)

Clearly, I now know what I should be doing for the next two months until the Expert exam at Inspire Nashville...

Cheers!

NJ

Thanks to @patrick_digan for showing me where this variable importance plot was hiding.

My solution attached

workflow