Assumptions#
1. Weak Learners Should Perform Slightly Better than Random#
The base learners (often shallow decision trees, called decision stumps) should have an accuracy just above random guessing:
For binary classification → slightly better than 50% accuracy.
Boosting works by combining many weak rules, so if each base learner is no better than chance, AdaBoost fails.
2. Additive Model of Errors#
AdaBoost assumes that errors from weak learners can be combined and corrected sequentially.
Misclassified samples get higher weights → future weak learners focus on them.
This assumes misclassification can be reduced step-by-step instead of being random noise.
3. Data is (Relatively) Clean#
AdaBoost is sensitive to noisy data and outliers, because:
Misclassified points get higher weights repeatedly.
Outliers that are impossible to classify correctly receive disproportionate focus.
Implicit assumption: dataset has low noise and few extreme outliers.
4. Feature Independence Isn’t Required (Unlike Naive Bayes)#
AdaBoost does not assume independence of features.
It can handle correlated features, but redundant features may make training inefficient.
5. Sufficient Number of Weak Learners#
Boosting assumes that with enough iterations (weak learners), the combined strong learner will converge to a low-error classifier.
Too few learners → underfitting; too many learners → risk of overfitting (though AdaBoost is surprisingly resistant to overfitting on clean data).
6. Weak Learners Should Be Simple#
Base learners should be simple (e.g., decision stumps or very shallow trees).
If base learners are too complex (deep trees), boosting loses meaning (becomes just an ensemble of strong models).
Summary
AdaBoost works best under these assumptions:
Weak learners perform slightly better than chance.
Errors can be sequentially corrected.
Data is relatively clean (not dominated by noise or outliers).
Enough learners are combined to reduce bias.
Base learners are simple and diverse.