Cost Functions

Cost Functions #

XGBoost is not a single model but a framework that supports different cost functions (a.k.a. loss functions) depending on whether you’re solving regression or classification.

Cost Functions in XGBRegressor#

In regression, the task is to predict a continuous value. XGBRegressor uses differentiable loss functions that measure prediction error.

Common loss functions:#

Squared Error (default)

\[ L(y, \hat{y}) = \frac{1}{2}(y - \hat{y})^2 \]
- Penalizes larger errors more heavily.
- Smooth and differentiable.
- Works well when errors are normally distributed.
Absolute Error (MAE)

\[ L(y, \hat{y}) = |y - \hat{y}| \]
- Robust to outliers (less penalty for extreme values).
- Slower optimization since gradient is less smooth.
Huber Loss (mix between MSE & MAE)

\[\begin{split} L(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & |y - \hat{y}| \leq \delta \\ \delta |y - \hat{y}| - \frac{1}{2}\delta^2 & |y - \hat{y}| > \delta \end{cases} \end{split}\]
- Balances robustness and sensitivity.
Quantile Loss

\[ L(y, \hat{y}) = \max(\alpha(y - \hat{y}), (1-\alpha)(\hat{y} - y)) \]
- Useful for prediction intervals (not just mean).

🔑 In practice, XGBRegressor defaults to squared error loss unless specified (objective="reg:squarederror").

Cost Functions in XGBClassifier#

For classification, the task is to predict probabilities (then assign classes). The cost functions measure probability calibration (how close predicted probabilities are to true labels).

Common loss functions:#

Logistic Loss (Binary Classification)

\[ L(y, \hat{p}) = - \big( y \log(\hat{p}) + (1-y) \log(1 - \hat{p}) \big) \]

where \(\hat{p} = \sigma(\hat{y}) = \frac{1}{1+e^{-\hat{y}}}\).
- Penalizes wrong confident predictions heavily.
- Optimized with Newton’s method (second-order gradients).
- Default for objective="binary:logistic".
Softmax Loss (Multiclass Classification) For \(K\) classes:

\[ L(y, \hat{p}) = - \sum_{k=1}^{K} \mathbf{1}_{y=k} \log(\hat{p}_k) \]

where

\[ \hat{p}_k = \frac{e^{\hat{y}_k}}{\sum_{j=1}^{K} e^{\hat{y}_j}} \]
- Standard cross-entropy loss.
- Used when objective="multi:softprob" or "multi:softmax".
Hinge Loss (SVM-style, optional)

\[ L(y, \hat{y}) = \max(0, 1 - y\hat{y}) \]
- Focuses on the margin between classes.
- Less probabilistic, more decision-boundary focused.

Intuition: Why these losses?#

Regression losses → measure distance between prediction & actual values.
Classification losses → measure probability calibration (confidence in correct class).
All are differentiable, allowing gradient boosting with 1st & 2nd order derivatives.

Summary:

XGBRegressor → squared error, MAE, Huber, quantile.
XGBClassifier → logistic loss (binary), softmax loss (multiclass), hinge loss (SVM-like).
Loss choice depends on whether you want robustness to outliers, probability calibration, or hard-margin separation.

Cost Functions

Contents

Cost Functions#

Cost Functions in XGBRegressor#

Common loss functions:#

Cost Functions in XGBClassifier#

Common loss functions:#

Intuition: Why these losses?#

Cost Functions #