AdaBoost (Adaptive Boosting)

AdaBoost (Adaptive Boosting)#

AdaBoost is a popular ensemble learning algorithm used primarily for classification (but can also be adapted for regression). Its core idea is to combine multiple “weak learners” into a strong learner.

Weak Learner: A model slightly better than random guessing (e.g., decision stump – a single-level decision tree).
Strong Learner: A combination of weak learners that achieves high accuracy.

AdaBoost adapts by focusing more on the misclassified instances from previous learners in each iteration.

Core Steps of AdaBoost #

Initialize Sample Weights
- Start with equal weights for all training samples:
\[ w_i = \frac{1}{N}, \quad i = 1,2,...,N \]

where \(N\) is the total number of training samples.
Train Weak Learner
- Fit a weak learner (e.g., a decision stump) using the weighted dataset.
Compute Learner Error
- Calculate the weighted error of the weak learner:
\[ \text{error}_t = \frac{\sum_{i=1}^{N} w_i \cdot \mathbf{1}(y_i \neq h_t(x_i))}{\sum_{i=1}^{N} w_i} \]

where \(h_t(x_i)\) is the prediction of the t-th weak learner, and \(\mathbf{1}\) is the indicator function (1 if wrong, 0 if correct).
Compute Learner Weight (Alpha)
- Assign a weight to the weak learner based on its accuracy:
\[ \alpha_t = \frac{1}{2} \ln\left(\frac{1 - \text{error}_t}{\text{error}_t}\right) \]
- Better learners get higher weights.
Update Sample Weights
- Increase weights of misclassified samples so the next learner focuses on them:
\[ w_i \leftarrow w_i \cdot e^{\alpha_t \cdot \mathbf{1}(y_i \neq h_t(x_i))} \]
- Normalize weights so they sum to 1.
Repeat
- Steps 2–5 are repeated for \(T\) weak learners.
Final Prediction
- The strong classifier is a weighted majority vote of all weak learners:
\[ H(x) = \text{sign} \Bigg( \sum_{t=1}^{T} \alpha_t \cdot h_t(x) \Bigg) \]

Key Intuition

AdaBoost focuses on hard-to-classify samples by increasing their weights.
Weak learners are combined using weighted voting; more accurate learners have more influence.
Despite using weak learners individually, the ensemble often achieves high accuracy.

Advantages

Works well with small weak learners.
Less prone to overfitting compared to single deep trees.
Adaptive: focuses on hard samples.

Disadvantages

Sensitive to noisy data and outliers.
Can be computationally expensive if many iterations are used.

AdaBoost (Adaptive Boosting)

Contents

AdaBoost (Adaptive Boosting)#

Core Steps of AdaBoost#

Core Steps of AdaBoost #