AdaBoost Classifier

AdaBoost Classifier #

AdaBoost (Adaptive Boosting) is an ensemble learning method that combines many weak classifiers (usually decision stumps = trees of depth 1) to build a strong classifier.
The key idea: Focus more on misclassified points in each round by adjusting sample weights.
Works well on tabular datasets, robust to overfitting if tuned properly.

How AdaBoost Classifier Works (Intuition)#

Start with equal weights
- Each sample is given the same importance (weight = 1/N).
Train a weak learner
- Example: a decision stump (one-split tree).
- Find the split that minimizes classification error (weighted error).
Evaluate performance of weak learner
- Weighted error:
  
  \[ \epsilon = \frac{\sum w_i \cdot I(y_i \neq h(x_i))}{\sum w_i} \]
- If error > 0.5, the stump is discarded.
Assign weight to the weak learner
- Performance score:
  
  \[ \alpha = \frac{1}{2} \ln\left(\frac{1 - \epsilon}{\epsilon}\right) \]
- Better learners get higher α (more influence).
Update sample weights
- Misclassified samples get higher weights → the next weak learner will focus more on them.
- Correctly classified samples get lower weights.
Repeat steps 2–5 for multiple rounds.
Final prediction
- Weighted majority vote of all weak learners:
  
  \[ H(x) = \text{sign}\left(\sum_{m=1}^M \alpha_m \cdot h_m(x)\right) \]

Geometric Intuition#

Imagine a dataset that’s not perfectly separable with a line.
Each weak learner (stump) draws a simple vertical/horizontal split.
AdaBoost sequentially bends the decision boundary by combining stumps, correcting mistakes iteratively.
End result: a non-linear decision boundary that fits the dataset much better than any single stump.

Key Hyperparameters#

n_estimators → number of weak learners (default 50).
- Too few → underfit.
- Too many → risk overfit.
learning_rate → shrinks each weak learner’s weight contribution.
- Low value (e.g., 0.1) → slower but more stable learning.
- High value (e.g., 1.0) → faster but risk overfit.
estimator → base model (default: DecisionTreeClassifier(max_depth=1)).
- Can increase depth for more complex patterns.

Performance Metrics#

Since it’s a classifier, evaluation uses:

Accuracy → overall correct predictions.
Precision, Recall, F1 → for imbalanced classes.
ROC-AUC → probability-based ranking of predictions.
Log Loss → penalizes overconfident wrong predictions.

Advantages#

✅ Works well with simple weak learners (stumps). ✅ Reduces bias significantly. ✅ Handles non-linear data. ✅ Robust against overfitting if tuned properly.

Limitations#

❌ Sensitive to noisy data & outliers (because they get high weights). ❌ Can be slower than Random Forest when n_estimators is large.

In short: AdaBoost Classifier = sequentially trained weak classifiers (stumps) + reweighting misclassified points → strong ensemble that produces non-linear decision boundaries.

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from matplotlib.colors import ListedColormap

# Generate toy dataset (2 features for visualization)
X, y = make_classification(
    n_samples=200, n_features=2, n_redundant=0, n_informative=2,
    n_clusters_per_class=1, flip_y=0.1, random_state=42
)

# Create weak learner (Decision Stump)
stump = DecisionTreeClassifier(max_depth=1)

# Train AdaBoost with increasing estimators
estimators_list = [1, 5, 20, 50]
models = [AdaBoostClassifier(estimator=stump, n_estimators=n, random_state=42).fit(X, y)
          for n in estimators_list]

# Create mesh for decision boundary plotting
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
                     np.linspace(y_min, y_max, 300))

# Plot decision boundaries
fig, axes = plt.subplots(1, 4, figsize=(20, 5))

for ax, model, n in zip(axes, models, estimators_list):
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    ax.contourf(xx, yy, Z, alpha=0.3, cmap=ListedColormap(['#FFAAAA', '#AAAAFF']))
    ax.scatter(X[:, 0], X[:, 1], c=y, cmap=ListedColormap(['#FF0000', '#0000FF']), edgecolor="k", s=30)
    ax.set_title(f"AdaBoost with {n} Estimators")

plt.tight_layout()
plt.show()

../../../_images/a52faaaced8e993dedefee28b9ce069dc1ab2423c27c2457ceb8e8e75096e389.png