Workflows

Workflows #

1. Input Preparation#

Data Formatting
- Input data is converted into DMatrix (optimized internal structure).
- Handles sparse data efficiently (compressed column storage).
Labels & Weights
- Target variable \(y\), instance weights, and optional missing values are stored.

2. Initialization#

Start with a base prediction:
- For regression → mean of targets.
- For classification → log-odds of positive class.
This becomes the initial model \(f_0(x)\).

3. Iterative Boosting Rounds#

For each boosting round \(t = 1, 2, \dots, T\):

(a) Compute Gradients & Hessians#

For each data point \(i\), compute:

\[ g_i = \frac{\partial L(y_i, \hat{y}_i)}{\partial \hat{y}_i}, \quad h_i = \frac{\partial^2 L(y_i, \hat{y}_i)}{\partial \hat{y}_i^2} \]
\(g_i\) = gradient (direction of steepest loss decrease).
\(h_i\) = Hessian (curvature, helps with step size).

(b) Build a Decision Tree#

At each node, evaluate possible splits using Gain:

\[ \text{Gain} = \frac{1}{2}\left(\frac{G_L^2}{H_L+\lambda} + \frac{G_R^2}{H_R+\lambda} - \frac{(G_L+G_R)^2}{H_L+H_R+\lambda}\right) - \gamma \]

where:
- \(G, H\) = sums of gradients & Hessians.
- \(\lambda\) = L2 regularization.
- \(\gamma\) = minimum loss reduction required.
If Gain > 0 (or > gamma), the split is made.

(c) Pruning / Stopping Splits#

Stop splitting if:
- Depth > max_depth.
- Gain < gamma.
- Node samples too few (min_child_weight).

(d) Assign Leaf Weights#

For each leaf, compute optimal weight:

\[ w^* = -\frac{\sum_i g_i}{\sum_i h_i + \lambda} \]
This weight minimizes the loss locally.

(e) Update Model#

Update predictions:

\[ \hat{y}_i^{(t)} = \hat{y}_i^{(t-1)} + \eta f_t(x_i) \]

where:
- \(f_t(x)\) = new tree’s output.
- \(\eta\) = learning rate (shrinkage factor).

4. Regularization#

Built-in penalties:
- L1 (α) → sparsity, feature selection.
- L2 (λ) → weight shrinkage.
Prevents overfitting by discouraging overly complex trees.

5. Prediction#

After T boosting rounds:

\[ \hat{y}_i = \sum_{t=1}^T \eta f_t(x_i) \]
Apply sigmoid (classification) or leave as-is (regression).

6. Model Evaluation#

Use evaluation metrics (logloss, rmse, auc, etc.) on train and validation sets.
Can apply early stopping if validation loss stops improving.

Workflow Summary (Simplified Pipeline)

Input data → DMatrix
Initialize model (baseline prediction)
For each boosting round:
- Compute gradients & Hessians
- Build tree using Gain formula
- Stop splitting if conditions met
- Assign leaf weights
- Update model with learning rate
Apply regularization
Repeat until max_rounds or early stopping
Final model = sum of all trees

Key Intuition XGBoost is gradient descent on trees — it learns residual patterns iteratively, but with second-order accuracy (gradients + Hessians), making it faster and more precise than AdaBoost or Gradient Boosting.

Workflows

Contents