Gradient Boosting Classifier

Gradient Boosting Classifier #

Gradient Boosting Classifier (GBC) is the classification version of Gradient Boosting. It builds an ensemble of weak learners (usually decision trees) in a stage-wise fashion, where each learner corrects the mistakes of the previous one.

1. Objective #

Given training data \((x_i, y_i)\) with \(y_i \in \{0,1\}\) or \(\{-1,+1\}\), the goal is to minimize a classification loss.
Common choice: Logistic loss

\[ L(y, F(x)) = \log\big(1 + e^{-y F(x)}\big), \quad y \in \{-1,+1\} \]

where \(F(x)\) is the additive model.

2. Initialization #

Start with a constant model:

\[ F_0(x) = \frac{1}{2} \ln \frac{p}{1-p} \]

where \(p\) is the proportion of positive samples.

This is the log-odds of the positive class.

3. Iterative boosting process #

At each iteration \(m\):

a) Compute pseudo-residuals #

Derivative of logistic loss wrt predictions:

\[ r_{im} = y_i - p_{i}^{(m-1)} \]

where \(p_i^{(m-1)} = \frac{1}{1+e^{-F_{m-1}(x_i)}}\) is the predicted probability.

Intuition: residuals = true label – predicted probability.

b) Fit weak learner #

Train a small decision tree \(h_m(x)\) on pseudo-residuals.
The tree tries to capture patterns in the misclassified points.

c) Compute multiplier #

Find best \(\gamma_m\) (step size) via line search:

\[ \gamma_m = \arg\min_\gamma \sum_{i=1}^n L\big(y_i, F_{m-1}(x_i) + \gamma h_m(x_i)\big) \]

d) Update model #

\[ F_m(x) = F_{m-1}(x) + \nu \cdot \gamma_m h_m(x) \]

\(\nu\) = learning rate (shrinkage).

4. Final prediction #

After \(M\) rounds, we have:

\[ F_M(x) = F_0(x) + \nu \sum_{m=1}^M \gamma_m h_m(x) \]

Convert to probability with sigmoid:

\[ p(x) = \frac{1}{1 + e^{-F_M(x)}} \]

Predict class:

\[\begin{split} \hat{y} = \begin{cases}1 & p(x) \geq 0.5 \\ 0 & p(x) < 0.5\end{cases} \end{split}\]

5. Intuition #

Each tree is trained on the errors of the previous ensemble.
Predictions are updated in small steps (learning rate).
Over many iterations, the model improves classification boundaries.

Key Features

Handles binary and multiclass classification (one-vs-rest or multinomial loss).
Can use different loss functions (log-loss, exponential loss, deviance).
Sensitive to learning rate and number of trees (requires tuning).
More robust than AdaBoost because it uses gradient descent rather than weighting errors exponentially.

from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Data
X, y = make_classification(n_samples=200, n_features=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Model
gbc = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gbc.fit(X_train, y_train)

# Predictions
y_pred = gbc.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.8666666666666667

Gradient Boosting Classifier

Contents