Assumtpions#

1. Dependent Variable is Binary#

  • Logistic regression is used for binary outcomes (0/1, Yes/No, True/False).

  • Extensions like multinomial or ordinal logistic regression exist for more than two categories.


2. Linearity of Logit#

  • Assumes a linear relationship between independent variables and the log-odds of the dependent variable.

  • Formally:

\[ \text{logit}(p) = \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \dots + \beta_n x_n \]
  • Note: Independent variables don’t need to be linearly related to the output probability, only to the logit.


3. Independence of Observations#

  • Observations should be independent of each other.

  • No repeated measures or correlated data unless you use techniques like generalized estimating equations (GEE).


4. No Perfect Multicollinearity#

  • Independent variables should not be perfectly correlated.

  • High correlation can make estimates unstable or inflate standard errors.


5. Large Sample Size#

  • Logistic regression uses maximum likelihood estimation (MLE).

  • Requires a reasonably large dataset to provide stable and reliable estimates.


6. Minimal or No Outliers in Predictors#

  • While logistic regression is less sensitive to outliers than linear regression, extreme values can still distort the model.

  • Consider scaling or removing influential points.


Optional Considerations#

  • Additive effects: Assumes that the predictors combine additively in the logit scale.

  • No assumption of homoscedasticity: Unlike linear regression, constant variance is not required.