community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.
#SANTALYTICS

Gather all 9 clues to complete the final Weekly Challenge on Dec 16!

Learn More

Selecting a Logistic Regression Model Type: Logit, Probit, or Cloglog?

Data Scientist
Data Scientist
Created on

 logitprobitcloglog.png

 When the target (dependent) variable of a regression is dichotomous (has only two possible values), a traditional linear (OLS) regression is not appropriate. This is because:

 

  1. a regression line can result in predictions outside of the range 0-1
  2. Linear regression assumes that the predictor (independent) variable has a constant marginal effect (the expected instantaneous change in the target variable as a function of a change in a certain explanatory variable while keeping all the other covariates constant) on the target variable.

 

Logit, probit and cloglog models account for these problems by fitting the data to a Cumulative Density Function (CDF), which is an S-shaped curve that falls within the range of the dependent variable, and allows for different rates of change at the low and high ends of the predictor variable. These three models differ from one another because they perform Maximum Likelihood Estimation (MLE) using different CDFs (link functions).

 

So... Which option to choose?

 

Note: Both the Logit and the Probit models will yield similar, but not necessarily the same results.

 

  • The Complimentary Log-Log (cloglog) function is unlike Logit and Probit because it is asymmetric. It is best used when the probability of an event is very small or very large.  The complementary log-log approaches 0 infinitely slower than any other link function. Cloglog model is closely related to continuous-time models for the occurrence of events, so it has an important application in the area of survival analysis and hazard modeling. So if you’re performing survival analysis or hazard modeling with a logistic regression, cloglog is the model for you.

 

When in doubt:

 

Logit tends to be the default link function to use when you have no particular reason to use another one. However, in some fields using probit is standard. Unless you have a good reason to deviate, it is probably your best bet to select the model your target audience is most familiar with.

 

audience.png

 

If you would like more detail, here are some additional resources:

 

http://bayesium.com/wp-content/uploads/2015/08/logit-probit-cloglog.pdf