Workflows

Contents

Workflows #

1. Initialization#

Pick a starting model \(F_0(x)\).
Usually a constant value that minimizes the loss:
- Regression (MSE): mean of \(y\).
- Classification (log-loss): log-odds of the positive class.

2. Compute pseudo-residuals#

At iteration \(m\), compute the negative gradient of the loss with respect to the predictions:

\[ r_{im} = -\left[ \frac{\partial L(y_i, F(x_i))}{\partial F(x_i)} \right]_{F(x)=F_{m-1}(x)} \]

Intuition: residuals tell us the direction of improvement for each training point.

3. Fit weak learner#

Train a weak learner \(h_m(x)\) (typically a shallow decision tree) to predict the pseudo-residuals.
This learner models the corrections the current model needs.

4. Find optimal weight (\(\gamma_m\))#

Do a line search to scale the learner:

\[ \gamma_m = \arg\min_\gamma \sum_{i=1}^n L\big(y_i, F_{m-1}(x_i) + \gamma h_m(x_i)\big) \]

5. Update model#

Add the learner to the model with shrinkage (learning rate \(\nu\)):

\[ F_m(x) = F_{m-1}(x) + \nu \cdot \gamma_m h_m(x) \]

Learning rate controls how much each learner contributes.

6. Repeat#

Go back to Step 2.
Each round reduces residuals until they are random noise or max iterations reached.

7. Final prediction#

After \(M\) rounds:

\[ F_M(x) = F_0(x) + \nu \sum_{m=1}^M \gamma_m h_m(x) \]

Extra Elements in Workflow #

Regularization: learning rate, subsampling (stochastic gradient boosting), tree depth control.
Early stopping: stop iterations when validation loss stops improving.
Output:
- Regression: direct \(F_M(x)\).
- Classification: apply link function (e.g., sigmoid for probabilities).

Summary Workflow

Initialize with constant model.
Compute pseudo-residuals (gradients).
Fit weak learner to residuals.
Find optimal multiplier.
Update model with shrinkage.
Repeat until convergence or iteration limit.
Use final boosted model for prediction.