Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Logistic Regression tool - Improving a model

FelipeSS
6 - Meteoroid

Hello!

 

I'm trying to use the logistic regression tool to create a predictive model with my dataset, but it doesn't seem to be so precise and accurate as I expected (as the exit 'I' from the tool shows).

 

How could I improve my model or configure it to have the best performance according to my dataset? Do I need to change my data, or do something different from it?

 

I'm attaching my workflow in this post.

 

Thank you!   

 

Felipe

8 REPLIES 8
AngelosPachis
16 - Nebula

Hi @FelipeSS ,

 

One thing I can suggest is that you may have to split your dataset into two samples, one used to train your model (Estimation, ~80%) and one to make predictions (Validation sample, ~20%). 

 

The reason you may want to think to implement this, is because otherwise your model will be biased, as seemingly it would make more accurate predictions than it actually can. The issue with that would become clearer if you apply the same model on a different dataset that your model has never "seen" before; chances are you accuracy to be quite low there.

 

I will revert with more comments hopefully, but this is a key thing you should take into consideration.

 

Cheers,

 

Angelos

mceleavey
17 - Castor
17 - Castor

Hi @FelipeSS ,

 

I've added some extra steps to your model and increased the accuracy by around 6%.

The main step is to introduce One-Hot Encoding to binarise your categorical variables. I built a tool to do this for you which I've attached.

Second, I introduced the positive value for your target variable.

Third, I randomly sampled your dataset to introduce a train and test stream.

 

mceleavey_0-1621416683077.png

 

mceleavey_1-1621416718193.png

 

I hope this helps.

 

M.

 



Bulien

danilang
19 - Altair
19 - Altair

Hi @FelipeSS 

 

Another thing to consider is that your data might not be a good fit for a logistical regression.     Try the other models and score them to see what your best fit is.

 

Dan

FelipeSS
6 - Meteoroid

Thank you, @mceleavey!

FelipeSS
6 - Meteoroid

Thank you, @danilang, for your answer! Do I need to select a numeric field in my 'predictor variables'? Or can I use categorical (String fields) too?

FelipeSS
6 - Meteoroid

Thank you, @AngelosPachis! I will consider it and see how it works here.

danilang
19 - Altair
19 - Altair

Hi @FelipeSS 

 

It depends on the model you're using.  The Linear and Gamma regression model require numeric predictors, so you have to one hot encode any categorical variables that you might want.  Most of the other models can handle categorical variable, but you should research the models first.  You should also look at the Data Investigation tools.  The can help you find correlations between your possible predictors to ensure that you're not including pairs of variables that are strongly positively or negatively correlated.

 

Dan

 

    

FelipeSS
6 - Meteoroid

Hi @danilang.

 

Thank you for your explanation. I'm sure it will help!

 

Felipe

Labels