Hi,
Having a few problems setting up Random Forest & Logistic Regression models.
When I run a Random Forest model, the output report is not giving a confusion matrix.
Just wondered if there was an obvious reason why this is happening.
Also, in logistic regression models, is it advisable to normalise predictor variables? (e.g. 0-1 range)
(there seems to be conflicting advice when I Google it)
It's just a fairly straightforward credit default dataset I'm using but the results are all
over the place.
thanks
J
Solved! Go to Solution.
I'll address your question about logistic regression first because it's more theoretical:
When you perform logistic regression, you're computing some value: y = a + bX where a is a bias term, b is a vector of weights, and X is a feature vector. You then can compute a probability by applying:
P = (e^y)/(1 + e^y)
This P is the output you see, between 0 and 1.
In a hand-wavy sort of way, you can think of the b values from our formula for y above as the "marginal effect" of that particular feature. In plain English: b_1 is the amount that a unit change in x_1 will affect y.
This means, that these b values are scaled by the magnitude of the X feature they're associated with.
Lets think about this in terms of economics. If I want to model Consumption: C as a function of Income: Y, then I would have something like: C = a + bY.
The value b tells me how much a single dollar increase in income will increase my consumption. If my income is currently $10 million, a single dollar increase in income probably wouldn't mean I spent an extra dollar in consumption. However, if my income is currently $1000, then my consumption is more likely to increase by nearly that full dollar.
The reason I'm giving this example is because the weights that are computed in your logistic regression are inherently taking the scale of each feature into account during training. If you normalize the predictor variables, then these weights will be scaled up (or down, depending on the original values of the predictors) accordingly, and you'll get the same output.
That being said, if you have some extremely high variance in the values of your predictor variables, your model may be worse at making predictions for those values away from the mean, and this is something you'll certainly need to diagnose, however normalization won't help you out here.
Let me know if you need me to clarify anything above, or if I skipped something. I'll look into your question about the Random Forest model's output in the meantime,
Cheers!
As for your question about the Random Forest Model, you could try out this Model Comparison Tool from the Alteryx Gallery. I believe you can get a confusion matrix out of it. I don't recall if you're able to get out out of the Forest Model Tool, but you should be able to get precision and recall, which are accuracy-like measures computed from that table.
Let me know if you need help setting that up.
Cheers!
great..thanks very much.
I'll try using the model comparison tool as you advise.
I'll be back on this tomorrow...it's been a long day...just going to sleep now!
Your input is much appreciated.
cheers
j
Essentially, that tool is treated as a standard Alteryx macro. Youll want to include it in your workflow by right clicking on the canvas and selected "Insert Macro", then navigating to that file. Once you've done that you should be able to configure it, connect an input stream, etc.
As for the rest of the outputs from the Forest Tool, what anchor are you looking at? Have you attached browses to all the anchor outputs?
Let me know if I've misunderstood your question.
Oh yes, for some reason I was under the impression that the Forest Tool had an Interactive Report anchor as well, similar to the Decision Tree Tool.
If the Model Comparison tool doesn't work for you, you could try to extract this information out of the O output using the R Tool.
Let me know how it goes,
thanks...!
I've worked out how to include the macro...so hopefully I should get the information ok from the Model Comparison
Thanks for the tip about the R tool !
cheers
J
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
5 |