We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

linear regression with categorical independent variable

Ela597
6 - Meteoroid

Hey, I am trying to run a linear regression with my independent variable being a categorical. I used one hot encoding to make binary fields. But when I try the linear regression I get these 8 errors (see picture) : 

I've tried changing the data type from bool to string, and I converted every 'False' to 0 but nothing seems to work.

How can I solve this?

7 REPLIES 7
mceleavey
17 - Castor
17 - Castor

Hi @Ela597 ,

 

In that example you only have 6 rows of data which would not be enough to train a model. Can you confirm you get this error with a full dataset?

 

M.



Bulien

Ela597
6 - Meteoroid

Hey mceleavey , yes that is a sample of my data. I am getting the error with a full dataset.

mceleavey
17 - Castor
17 - Castor

Hi @Ela597 ,

 

without the full dataset it's going to be difficult to help. One thing I can see in the sample is that there are no variations in your categorical variables, which would make it impossible to build a model.

 

If you can post your full dataset we might be able to help.

 

M.



Bulien

Ela597
6 - Meteoroid

Hey mceleavey 

Hereby my full dataset.

mceleavey
17 - Castor
17 - Castor

Hi @Ela597 ,

 

I've applied a few changes to the data by amending the boolean type to int16, and splitting out a test and train with only one record.

This allows the model to run correctly, but what I would say is you don't have enough variation in your data, and the data variation is fairly uniform. You also don't really have enough data to inform a particularly in-depth linear regression.

 

Anyway, I've attached the workflow and this is running now.

 

I hope this helps.

 

M.



Bulien

Ela597
6 - Meteoroid

mceleavey  Thank you! 

 

Is it then better to use correlation analyses to examine the relationship between the score and the independent variable?

mceleavey
17 - Castor
17 - Castor

Hi @Ela597 ,

 

I would certainly start there. This will tell you if there is a correlation and the strength of the signal.

 

M.



Bulien

Labels
Top Solution Authors