Workflows#

1. Initialization#

  • Pick a starting model \(F_0(x)\).

  • Usually a constant value that minimizes the loss:

    • Regression (MSE): mean of \(y\).

    • Classification (log-loss): log-odds of the positive class.


2. Compute pseudo-residuals#

  • At iteration \(m\), compute the negative gradient of the loss with respect to the predictions:

\[ r_{im} = -\left[ \frac{\partial L(y_i, F(x_i))}{\partial F(x_i)} \right]_{F(x)=F_{m-1}(x)} \]
  • Intuition: residuals tell us the direction of improvement for each training point.


3. Fit weak learner#

  • Train a weak learner \(h_m(x)\) (typically a shallow decision tree) to predict the pseudo-residuals.

  • This learner models the corrections the current model needs.


4. Find optimal weight (\(\gamma_m\))#

  • Do a line search to scale the learner:

\[ \gamma_m = \arg\min_\gamma \sum_{i=1}^n L\big(y_i, F_{m-1}(x_i) + \gamma h_m(x_i)\big) \]

5. Update model#

  • Add the learner to the model with shrinkage (learning rate \(\nu\)):

\[ F_m(x) = F_{m-1}(x) + \nu \cdot \gamma_m h_m(x) \]
  • Learning rate controls how much each learner contributes.


6. Repeat#

  • Go back to Step 2.

  • Each round reduces residuals until they are random noise or max iterations reached.


7. Final prediction#

  • After \(M\) rounds:

\[ F_M(x) = F_0(x) + \nu \sum_{m=1}^M \gamma_m h_m(x) \]

Extra Elements in Workflow#

  • Regularization: learning rate, subsampling (stochastic gradient boosting), tree depth control.

  • Early stopping: stop iterations when validation loss stops improving.

  • Output:

    • Regression: direct \(F_M(x)\).

    • Classification: apply link function (e.g., sigmoid for probabilities).


Summary Workflow

  1. Initialize with constant model.

  2. Compute pseudo-residuals (gradients).

  3. Fit weak learner to residuals.

  4. Find optimal multiplier.

  5. Update model with shrinkage.

  6. Repeat until convergence or iteration limit.

  7. Use final boosted model for prediction.