Hey, I am trying to run a linear regression with my independent variable being a categorical. I used one hot encoding to make binary fields. But when I try the linear regression I get these 8 errors (see picture) :
I've tried changing the data type from bool to string, and I converted every 'False' to 0 but nothing seems to work.
How can I solve this?
Solved! Go to Solution.
Hi @Ela597 ,
In that example you only have 6 rows of data which would not be enough to train a model. Can you confirm you get this error with a full dataset?
M.
Hey mceleavey , yes that is a sample of my data. I am getting the error with a full dataset.
Hi @Ela597 ,
without the full dataset it's going to be difficult to help. One thing I can see in the sample is that there are no variations in your categorical variables, which would make it impossible to build a model.
If you can post your full dataset we might be able to help.
M.
Hey mceleavey
Hereby my full dataset.
Hi @Ela597 ,
I've applied a few changes to the data by amending the boolean type to int16, and splitting out a test and train with only one record.
This allows the model to run correctly, but what I would say is you don't have enough variation in your data, and the data variation is fairly uniform. You also don't really have enough data to inform a particularly in-depth linear regression.
Anyway, I've attached the workflow and this is running now.
I hope this helps.
M.
mceleavey Thank you!
Is it then better to use correlation analyses to examine the relationship between the score and the independent variable?
Hi @Ela597 ,
I would certainly start there. This will tell you if there is a correlation and the strength of the signal.
M.