Workflows#
1. Initialization#
Pick a starting model \(F_0(x)\).
Usually a constant value that minimizes the loss:
Regression (MSE): mean of \(y\).
Classification (log-loss): log-odds of the positive class.
2. Compute pseudo-residuals#
At iteration \(m\), compute the negative gradient of the loss with respect to the predictions:
Intuition: residuals tell us the direction of improvement for each training point.
3. Fit weak learner#
Train a weak learner \(h_m(x)\) (typically a shallow decision tree) to predict the pseudo-residuals.
This learner models the corrections the current model needs.
4. Find optimal weight (\(\gamma_m\))#
Do a line search to scale the learner:
5. Update model#
Add the learner to the model with shrinkage (learning rate \(\nu\)):
Learning rate controls how much each learner contributes.
6. Repeat#
Go back to Step 2.
Each round reduces residuals until they are random noise or max iterations reached.
7. Final prediction#
After \(M\) rounds:
Extra Elements in Workflow#
Regularization: learning rate, subsampling (stochastic gradient boosting), tree depth control.
Early stopping: stop iterations when validation loss stops improving.
Output:
Regression: direct \(F_M(x)\).
Classification: apply link function (e.g., sigmoid for probabilities).
Summary Workflow
Initialize with constant model.
Compute pseudo-residuals (gradients).
Fit weak learner to residuals.
Find optimal multiplier.
Update model with shrinkage.
Repeat until convergence or iteration limit.
Use final boosted model for prediction.