Assumptions

Contents

Assumptions #

1. Feature Independence #

Assumption: Features are independent given the class.
Reality: Features are often correlated.
- Example: In spam emails, the words “lottery” and “win” appear together frequently. Independence is false.
Why it still works: The product of probabilities often still yields a reasonable ranking of classes, even if absolute probabilities are wrong. Classification only needs the highest posterior, not exact values.

2. Equal Contribution of Features #

Assumption: Each feature contributes equally to the prediction.
Reality: Some features dominate.
- Example: In medical diagnosis, “tumor detected in MRI” is far stronger than “slight fever”.
Why it still works: In high-dimensional settings (like text classification), many weak but independent-ish signals combine to give strong results.

3. Distribution of Features #

Assumption:
- Gaussian NB → features are normally distributed.
- Multinomial NB → word counts follow multinomial distribution.
- Bernoulli NB → features are binary indicators.
Reality: Data distributions often deviate.
- Example: Continuous features may be skewed, not Gaussian.
Why it still works: As long as the assumed distribution is a rough approximation, the decision boundary can still separate classes effectively.

4. No Zero Probability #

Assumption: Every feature-class combination has a nonzero probability.
Reality: Some words/values may not appear in training.
- Example: If “Bitcoin” never appeared in spam training data, then \(P(\text{Bitcoin}|\text{spam}) = 0\).
Why it still works: With Laplace (add-one) smoothing, we avoid zeros and keep predictions stable.

Key Insight: Even though independence and distribution assumptions are false in practice, Naive Bayes still works well when:

Features provide enough weak evidence.
The goal is classification, not perfect probability estimation.
Data is high-dimensional and sparse (like text).

❌ It fails when:

Strong feature correlations matter.
Precise probability estimates are required (not just classification).