Hello!
I'm trying to use the logistic regression tool to create a predictive model with my dataset, but it doesn't seem to be so precise and accurate as I expected (as the exit 'I' from the tool shows).
How could I improve my model or configure it to have the best performance according to my dataset? Do I need to change my data, or do something different from it?
I'm attaching my workflow in this post.
Thank you!
Felipe
Solved! Go to Solution.
Hi @FelipeSS ,
One thing I can suggest is that you may have to split your dataset into two samples, one used to train your model (Estimation, ~80%) and one to make predictions (Validation sample, ~20%).
The reason you may want to think to implement this, is because otherwise your model will be biased, as seemingly it would make more accurate predictions than it actually can. The issue with that would become clearer if you apply the same model on a different dataset that your model has never "seen" before; chances are you accuracy to be quite low there.
I will revert with more comments hopefully, but this is a key thing you should take into consideration.
Cheers,
Angelos
Hi @FelipeSS ,
I've added some extra steps to your model and increased the accuracy by around 6%.
The main step is to introduce One-Hot Encoding to binarise your categorical variables. I built a tool to do this for you which I've attached.
Second, I introduced the positive value for your target variable.
Third, I randomly sampled your dataset to introduce a train and test stream.
I hope this helps.
M.
Hi @FelipeSS
Another thing to consider is that your data might not be a good fit for a logistical regression. Try the other models and score them to see what your best fit is.
Dan
Thank you, @mceleavey!
Thank you, @danilang, for your answer! Do I need to select a numeric field in my 'predictor variables'? Or can I use categorical (String fields) too?
Thank you, @AngelosPachis! I will consider it and see how it works here.
Hi @FelipeSS
It depends on the model you're using. The Linear and Gamma regression model require numeric predictors, so you have to one hot encode any categorical variables that you might want. Most of the other models can handle categorical variable, but you should research the models first. You should also look at the Data Investigation tools. The can help you find correlations between your possible predictors to ensure that you're not including pairs of variables that are strongly positively or negatively correlated.
Dan
Hi @danilang.
Thank you for your explanation. I'm sure it will help!
Felipe