Cost Functions#

1. Mean Squared Error (MSE)#

  • The most common cost function.

  • At each split, the tree chooses the feature & threshold that minimizes the variance of the target values.

\[ MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y})^2 \]
  • Here:

    • \(y_i\) = actual value

    • \(\hat{y}\) = predicted value (mean of samples in that leaf)

    • \(n\) = number of samples in the node

👉 Minimizing MSE means nodes will group data where target values are close together.


2. Mean Absolute Error (MAE)#

\[ MAE = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}| \]
  • Uses the median of values in the node for prediction (instead of mean).

  • More robust to outliers than MSE.


3. Friedman’s Mean Squared Error (Friedman MSE)#

  • A variation of MSE used in scikit-learn.

  • Adds a correction term to reduce bias when splitting nodes, especially useful in gradient boosting trees.


4. Poisson (for count regression)#

  • For target values that represent counts (non-negative integers).

  • Cost function is based on Poisson deviance:

\[ D(y, \hat{y}) = 2 \sum_{i=1}^n \left( y_i \log \frac{y_i}{\hat{y}_i} - (y_i - \hat{y}_i) \right) \]

Summary

  • MSE → default, sensitive to outliers, good for general regression.

  • MAE → robust to outliers, gives median-based predictions.

  • Friedman MSE → specialized, often used in ensembles.

  • Poisson → best for count-based data.