Alteryx Designer Desktop Discussions

datascot · ‎09-20-2018

Hi,
Having a few problems setting up Random Forest & Logistic Regression models.

When I run a Random Forest model, the output report is not giving a confusion matrix.
Just wondered if there was an obvious reason why this is happening.

Also, in logistic regression models, is it advisable to normalise predictor variables? (e.g. 0-1 range)
(there seems to be conflicting advice when I Google it)

It's just a fairly straightforward credit default dataset I'm using but the results are all

over the place.

thanks
J

tcroberts · ‎09-20-2018

I'll address your question about logistic regression first because it's more theoretical:

When you perform logistic regression, you're computing some value: y = a + bX where a is a bias term, b is a vector of weights, and X is a feature vector. You then can compute a probability by applying:

P = (e^y)/(1 + e^y)

This P is the output you see, between 0 and 1.

In a hand-wavy sort of way, you can think of the b values from our formula for y above as the "marginal effect" of that particular feature. In plain English: b_1 is the amount that a unit change in x_1 will affect y.

This means, that these b values are scaled by the magnitude of the X feature they're associated with.

Lets think about this in terms of economics. If I want to model Consumption: C as a function of Income: Y, then I would have something like: C = a + bY.

The value b tells me how much a single dollar increase in income will increase my consumption. If my income is currently $10 million, a single dollar increase in income probably wouldn't mean I spent an extra dollar in consumption. However, if my income is currently $1000, then my consumption is more likely to increase by nearly that full dollar.

The reason I'm giving this example is because the weights that are computed in your logistic regression are inherently taking the scale of each feature into account during training. If you normalize the predictor variables, then these weights will be scaled up (or down, depending on the original values of the predictors) accordingly, and you'll get the same output.

That being said, if you have some extremely high variance in the values of your predictor variables, your model may be worse at making predictions for those values away from the mean, and this is something you'll certainly need to diagnose, however normalization won't help you out here.

Let me know if you need me to clarify anything above, or if I skipped something. I'll look into your question about the Random Forest model's output in the meantime,

Cheers!

tcroberts · ‎09-20-2018

As for your question about the Random Forest Model, you could try out this Model Comparison Tool from the Alteryx Gallery. I believe you can get a confusion matrix out of it. I don't recall if you're able to get out out of the Forest Model Tool, but you should be able to get precision and recall, which are accuracy-like measures computed from that table.

Let me know if you need help setting that up.

Cheers!

datascot · ‎09-20-2018

Thanks again...that's all very helpful!

I'll analyse my feature variables bearing in mind your points, and hopefully get a better

understanding of what's going on.

J

________________________________
The University is ranked in the QS World Rankings of the top 5% of universities in the world (QS World University Rankings, 2016/17)
The University of Stirling is a charity registered in Scotland, number SC 011159.

datascot · ‎09-20-2018

great..thanks very much.
I'll try using the model comparison tool as you advise.
I'll be back on this tomorrow...it's been a long day...just going to sleep now!
Your input is much appreciated.

cheers

j

datascot · ‎09-21-2018

Hi,
Could I check with you about downloading the Model Comparison Tool?
When I request the download, it imports a workflow into Alteryx, (Model Comparison.yxmc),

but I can't see any reference to the MCT tool itself?

At the moment, when I run my Random Forest model, all I get in the report is Basic Summary, Percentage error for diff tree numbers graph
& Variable Importance plot.
That's all...no precision, accuracy etc stats

I'm probably missing something really obvious here, but getting very confused! 😃

cheers

J

________________________________
The University is ranked in the QS World Rankings of the top 5% of universities in the world (QS World University Rankings, 2016/17)
The University of Stirling is a charity registered in Scotland, number SC 011159.

tcroberts · ‎09-21-2018

Essentially, that tool is treated as a standard Alteryx macro. Youll want to include it in your workflow by right clicking on the canvas and selected "Insert Macro", then navigating to that file. Once you've done that you should be able to configure it, connect an input stream, etc.

As for the rest of the outputs from the Forest Tool, what anchor are you looking at? Have you attached browses to all the anchor outputs?

Let me know if I've misunderstood your question.

datascot · ‎09-21-2018

ah ok...thanks!
I misunderstood...I thought the tool was just a standard tool addition with would appear on the "predictive" toolbar.

I haven't used a macro before in Alteryx so I'll investigate that!

With the Random Forest workflow that I'm using, I'm looking at the "R" output from the Forest Model tool
(I've attached the workflow)

thanks again for your help!

cheers

J

________________________________
The University is ranked in the QS World Rankings of the top 5% of universities in the world (QS World University Rankings, 2016/17)
The University of Stirling is a charity registered in Scotland, number SC 011159.

tcroberts · ‎09-21-2018

Oh yes, for some reason I was under the impression that the Forest Tool had an Interactive Report anchor as well, similar to the Decision Tree Tool.

If the Model Comparison tool doesn't work for you, you could try to extract this information out of the O output using the R Tool.

Let me know how it goes,

datascot · ‎09-21-2018

thanks...!
I've worked out how to include the macro...so hopefully I should get the information ok from the Model Comparison
Thanks for the tip about the R tool !

cheers

J

Alteryx Designer Desktop Discussions

Forest Model & Logistic Regression tools