Assumtpions#
1. Dependent Variable is Binary#
Logistic regression is used for binary outcomes (0/1, Yes/No, True/False).
Extensions like multinomial or ordinal logistic regression exist for more than two categories.
2. Linearity of Logit#
Assumes a linear relationship between independent variables and the log-odds of the dependent variable.
Formally:
Note: Independent variables don’t need to be linearly related to the output probability, only to the logit.
3. Independence of Observations#
Observations should be independent of each other.
No repeated measures or correlated data unless you use techniques like generalized estimating equations (GEE).
4. No Perfect Multicollinearity#
Independent variables should not be perfectly correlated.
High correlation can make estimates unstable or inflate standard errors.
5. Large Sample Size#
Logistic regression uses maximum likelihood estimation (MLE).
Requires a reasonably large dataset to provide stable and reliable estimates.
6. Minimal or No Outliers in Predictors#
While logistic regression is less sensitive to outliers than linear regression, extreme values can still distort the model.
Consider scaling or removing influential points.
Optional Considerations#
Additive effects: Assumes that the predictors combine additively in the logit scale.
No assumption of homoscedasticity: Unlike linear regression, constant variance is not required.