Cost Functions#
XGBoost is not a single model but a framework that supports different cost functions (a.k.a. loss functions) depending on whether you’re solving regression or classification.
Cost Functions in XGBRegressor#
In regression, the task is to predict a continuous value. XGBRegressor uses differentiable loss functions that measure prediction error.
Common loss functions:#
Squared Error (default)
\[ L(y, \hat{y}) = \frac{1}{2}(y - \hat{y})^2 \]Penalizes larger errors more heavily.
Smooth and differentiable.
Works well when errors are normally distributed.
Absolute Error (MAE)
\[ L(y, \hat{y}) = |y - \hat{y}| \]Robust to outliers (less penalty for extreme values).
Slower optimization since gradient is less smooth.
Huber Loss (mix between MSE & MAE)
\[\begin{split} L(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & |y - \hat{y}| \leq \delta \\ \delta |y - \hat{y}| - \frac{1}{2}\delta^2 & |y - \hat{y}| > \delta \end{cases} \end{split}\]Balances robustness and sensitivity.
Quantile Loss
\[ L(y, \hat{y}) = \max(\alpha(y - \hat{y}), (1-\alpha)(\hat{y} - y)) \]Useful for prediction intervals (not just mean).
🔑 In practice, XGBRegressor defaults to squared error loss unless specified (objective="reg:squarederror").
Cost Functions in XGBClassifier#
For classification, the task is to predict probabilities (then assign classes). The cost functions measure probability calibration (how close predicted probabilities are to true labels).
Common loss functions:#
Logistic Loss (Binary Classification)
\[ L(y, \hat{p}) = - \big( y \log(\hat{p}) + (1-y) \log(1 - \hat{p}) \big) \]where \(\hat{p} = \sigma(\hat{y}) = \frac{1}{1+e^{-\hat{y}}}\).
Penalizes wrong confident predictions heavily.
Optimized with Newton’s method (second-order gradients).
Default for
objective="binary:logistic".
Softmax Loss (Multiclass Classification) For \(K\) classes:
\[ L(y, \hat{p}) = - \sum_{k=1}^{K} \mathbf{1}_{y=k} \log(\hat{p}_k) \]where
\[ \hat{p}_k = \frac{e^{\hat{y}_k}}{\sum_{j=1}^{K} e^{\hat{y}_j}} \]Standard cross-entropy loss.
Used when
objective="multi:softprob"or"multi:softmax".
Hinge Loss (SVM-style, optional)
\[ L(y, \hat{y}) = \max(0, 1 - y\hat{y}) \]Focuses on the margin between classes.
Less probabilistic, more decision-boundary focused.
Intuition: Why these losses?#
Regression losses → measure distance between prediction & actual values.
Classification losses → measure probability calibration (confidence in correct class).
All are differentiable, allowing gradient boosting with 1st & 2nd order derivatives.
Summary:
XGBRegressor → squared error, MAE, Huber, quantile.
XGBClassifier → logistic loss (binary), softmax loss (multiclass), hinge loss (SVM-like).
Loss choice depends on whether you want robustness to outliers, probability calibration, or hard-margin separation.