Cost Functions

Contents

Cost Functions #

1. Mean Squared Error (MSE)#

The most common cost function.
At each split, the tree chooses the feature & threshold that minimizes the variance of the target values.

\[ MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y})^2 \]

Here:
- \(y_i\) = actual value
- \(\hat{y}\) = predicted value (mean of samples in that leaf)
- \(n\) = number of samples in the node

👉 Minimizing MSE means nodes will group data where target values are close together.

2. Mean Absolute Error (MAE)#

\[ MAE = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}| \]

Uses the median of values in the node for prediction (instead of mean).
More robust to outliers than MSE.

3. Friedman’s Mean Squared Error (Friedman MSE)#

A variation of MSE used in scikit-learn.
Adds a correction term to reduce bias when splitting nodes, especially useful in gradient boosting trees.

4. Poisson (for count regression)#

For target values that represent counts (non-negative integers).
Cost function is based on Poisson deviance:

\[ D(y, \hat{y}) = 2 \sum_{i=1}^n \left( y_i \log \frac{y_i}{\hat{y}_i} - (y_i - \hat{y}_i) \right) \]

Summary

MSE → default, sensitive to outliers, good for general regression.
MAE → robust to outliers, gives median-based predictions.
Friedman MSE → specialized, often used in ensembles.
Poisson → best for count-based data.