Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Logistic regression: identical coefficients and odd results compared to descriptive stats

Flo_G
7 - Meteor

I'm running an analysis on conversions, which are represented by a 0/1 variable. I would like to use two predicting variables, language and country. All variables are string.

 

If I run descriptive statistics on my database, conversion rate by language and conversion rate by country, I can see pretty stark differences between populations. I tried to run a logistic regression combining the two to see if I could get indication on significant differences, but the results are completely off.

 

I have 4 languages and 20 countries in the database.

 

If I run glm(conversion.flag = Language + Country), the coefficients are almost exactly the same for all languages and countries. Almost all coefficients are not statistically significant, with one exception. I'm sure this should not be the case, how can I fix this? I would be very interested in this as I want to see if there are different behaviors when combining languages and countries (eg EN speakers in a DE country are not as engaged as DE speakers in DE countries).

 

Second problem, If I run glm(conversion.flag = Language) (assuming that country might not be a significant predictor), the coefficients are all significant (***) but they go in the completely opposite direction of what I'd infer from descriptive statistics! 

 

Descriptive stats:

LanguageConversion rate
N/A76%

E

88%
D96%
F96%
I94%

 

Log coefficients:

LanguageCoefficient
N/A (intercept, I assume)0.85

E

-0.31
D-0.21
F-0.25
I-0.16

 

Maybe the Log regression tool is actually reading the '1' flag as 0, and the '0' flag as 1? How do I change that?

 

I'm using the logit model, which seems to be appropriate for my use case. Am I doing something wrong in the setup?

11 REPLIES 11
Flo_G
7 - Meteor

Yes, the target variable is populated - I've run the workflow a few times. The macro version is Logistic Regression 1.1, and Alteryx is 2020.1.2.24185 - I'm on a company account so can't update it myself unfortunately

DrDan
Alteryx Alumni (Retired)

Hi @Flo_G ,

 

I know know why #N/A is the holdout category, the # makes that level first in the sort order.

 

By the look of things, you are running into numeric issues, likely as a result of a near singular design matrix. Could you do three things? First, send results of a contingency table between language and country. Second, could you send the full results of the language only model, including the summary statistics for the model. Third, could you tell me the number of observations in the data set.

 

Dan

Labels