Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Logistic Regression - adjusting categorical variables similar to SPSS

azhaglov
6 - Meteoroid

Here's my issue: I am trying to reproduce the model built in SPSS. I've built the Logistic regression in Alteryx and it returns the set of coefficients which do not match SPSS output. For the categorical variable, Alteryx puts particular dummy variable as a reference, so that if I have Region (East, West, North, South, Central), it may choose EAST as a reference and only calculate coefficients for the rest (West, North, South, Central). SPSS allows for specific categorical variables coding/choice upon user's input so that I can point out exactly what I want: CENTRAL to be the reference, others - to be used in the calculation to produce coefficients.

 

Is there any way that can help to adjust/encode variables in a desired manner while configuring logistic regression? Perhaps, I can adjust R-code behind Logistic Regression? 

 

I know it should not be very different in terms of coefficients, but it is important in this project to follow the same logic as in SPSS model.

 

Thanks!

8 REPLIES 8
Joe_Mako
12 - Quasar

What would be some sample example input and output that you would expect?

azhaglov
6 - Meteoroid

I created the fake data set with Region, Sales as variables and Target = Invest (attached screenshots). The Region is categorical, so Alteryx / R converts it to dummy variables and runs the regression. However, the choice of which category to choose as a reference is up to Alteryx. In this example, Alteryx decided to exclude CENTRAL (2nd screenshot) and ran coefficients for East, West, South, and North accordingly.  

 

What I want is to change that pre-determined choice and enforce let's say EAST to be excluded. 

 

 

Philip
12 - Quasar

R is doing the assigning. You'd be best creating an ordinal variable out of the categorical variable with 0 being the one compared to. You might also try creating the sort as this to see if it works:

 

0-East

1-West

2-North

3-South

BridgetT
Alteryx Alumni (Retired)

Actually, creating an ordinal variable for a variable without a "true" underlying order is not best statistical practice. Based on the ordering @Philip suggested, R will think that East<West<North<South, which isn't true. You can solve the problem in the way you originally intended by creating your desired dummy variables manually via a Formula Tool. I'm attaching an example here. Then, you can use those dummy variables as your predictors instead of your original North/South/East/West/Cent field.

Bridget Toomey

Research Scientist, Analytic Products

Alteryx
azhaglov
6 - Meteoroid

BridgetS, 

thanks for your ideas. Unfortunately, I can't open this workflow as it was created in a more recent version of the application. I am using 11.0.6 and it does not seem that there's a newer version. Perhaps, you're using some early prototypes of the next version :) Any thoughts? 

BridgetT
Alteryx Alumni (Retired)

@azhaglov, sorry about that! I'm running 11.3, which was recently released. Can you open this version?

Bridget Toomey

Research Scientist, Analytic Products

Alteryx
focus_nhei
5 - Atom

hi I am still working with version 10.6.9

 

Could you please re-upload?

 

That would be so helpful!

Best regards

MAAbdullahAlMubarah
8 - Asteroid

What we do if the Categorial variable let say more than 30 Variables, how can we do it?

Labels