AdaBoost (Adaptive Boosting)

AdaBoost (Adaptive Boosting)#

AdaBoost is a popular ensemble learning algorithm used primarily for classification (but can also be adapted for regression). Its core idea is to combine multiple “weak learners” into a strong learner.

  • Weak Learner: A model slightly better than random guessing (e.g., decision stump – a single-level decision tree).

  • Strong Learner: A combination of weak learners that achieves high accuracy.

AdaBoost adapts by focusing more on the misclassified instances from previous learners in each iteration.


Core Steps of AdaBoost#

  1. Initialize Sample Weights

    • Start with equal weights for all training samples:

    \[ w_i = \frac{1}{N}, \quad i = 1,2,...,N \]

    where \(N\) is the total number of training samples.

  2. Train Weak Learner

    • Fit a weak learner (e.g., a decision stump) using the weighted dataset.

  3. Compute Learner Error

    • Calculate the weighted error of the weak learner:

    \[ \text{error}_t = \frac{\sum_{i=1}^{N} w_i \cdot \mathbf{1}(y_i \neq h_t(x_i))}{\sum_{i=1}^{N} w_i} \]

    where \(h_t(x_i)\) is the prediction of the t-th weak learner, and \(\mathbf{1}\) is the indicator function (1 if wrong, 0 if correct).

  4. Compute Learner Weight (Alpha)

    • Assign a weight to the weak learner based on its accuracy:

    \[ \alpha_t = \frac{1}{2} \ln\left(\frac{1 - \text{error}_t}{\text{error}_t}\right) \]
    • Better learners get higher weights.

  5. Update Sample Weights

    • Increase weights of misclassified samples so the next learner focuses on them:

    \[ w_i \leftarrow w_i \cdot e^{\alpha_t \cdot \mathbf{1}(y_i \neq h_t(x_i))} \]
    • Normalize weights so they sum to 1.

  6. Repeat

    • Steps 2–5 are repeated for \(T\) weak learners.

  7. Final Prediction

    • The strong classifier is a weighted majority vote of all weak learners:

    \[ H(x) = \text{sign} \Bigg( \sum_{t=1}^{T} \alpha_t \cdot h_t(x) \Bigg) \]

Key Intuition

  • AdaBoost focuses on hard-to-classify samples by increasing their weights.

  • Weak learners are combined using weighted voting; more accurate learners have more influence.

  • Despite using weak learners individually, the ensemble often achieves high accuracy.


Advantages

  • Works well with small weak learners.

  • Less prone to overfitting compared to single deep trees.

  • Adaptive: focuses on hard samples.

Disadvantages

  • Sensitive to noisy data and outliers.

  • Can be computationally expensive if many iterations are used.