Cost Function#

  • A cost function (also called splitting criterion) measures how “good” a split is at a node in a decision tree.

  • Random Forest builds many decision trees, and each tree uses a cost function to decide where to split the data.

  • The goal is to minimize prediction error in the leaves.


2. Cost Functions for Regression#

For Random Forest Regressor, the commonly used criteria are:

Criterion

What it Measures

Formula / Intuition

squared_error (default)

Variance reduction

Measures how much splitting reduces variance of target values in child nodes. The split that reduces variance the most is chosen.

absolute_error

Mean Absolute Error (L1)

Measures sum of absolute differences between target values and mean in child nodes. Less sensitive to outliers than squared error.

poisson

Poisson deviance

Used for count data, assumes a Poisson distribution of targets.

Variance Reduction Intuition:

  • Suppose a node contains target values [10, 12, 15, 14].

  • Splitting them into [10, 12] and [15, 14] reduces variance in child nodes.

  • The split that minimizes the average variance across children is chosen.

Mathematically:

\[ \text{Cost (variance)} = \text{Var(parent)} - \frac{n_\text{left}}{n_\text{parent}} \text{Var(left)} - \frac{n_\text{right}}{n_\text{parent}} \text{Var(right)} \]
  • The larger the reduction, the better the split.


3. Cost Functions for Classification#

For Random Forest Classifier, the commonly used criteria are:

Criterion

What it Measures

Formula / Intuition

gini

Gini Impurity

Measures how often a randomly chosen sample would be misclassified if labeled according to the node’s class distribution. Lower is better.

entropy

Information Gain

Measures uncertainty of class labels. Split that maximizes information gain (reduces entropy) is chosen.

log_loss

Logarithmic loss

Measures probability-based error. More precise for probabilistic classification.

Gini Impurity Formula:

\[ Gini = 1 - \sum_{i=1}^{C} p_i^2 \]
  • \(p_i\) = proportion of class \(i\) in the node

  • Gini = 0 → node is pure (all samples same class)

  • Gini = max → node is mixed evenly

Entropy Formula:

\[ Entropy = -\sum_{i=1}^{C} p_i \log_2(p_i) \]
  • Measures uncertainty

  • Split that reduces entropy the most → preferred


4. Intuition Behind Cost Functions#

  1. Regression (Variance Reduction):

    • Split the node so that children are as homogeneous as possible in target values.

  2. Classification (Gini/Entropy):

    • Split the node so that children are as pure as possible, meaning samples in a child node mostly belong to one class.

  3. Random Forest Aggregates Trees:

    • Even if one tree chooses a suboptimal split (high cost), averaging across many trees reduces the impact of bad splits.


Key Points

  • Random Forest does not have a global cost function; each tree optimizes locally at each node.

  • Aggregation of predictions across trees handles variance and bias, making RF robust.

  • Choice of criterion affects:

    • Model performance (sometimes small differences)

    • Sensitivity to outliers (squared_error vs absolute_error)

    • Training speed (gini is faster than entropy)

Classification Example (Gini vs Entropy)#

# Import Libraries
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Random Forest with Gini
rf_gini = RandomForestClassifier(n_estimators=100, criterion='gini', random_state=42)
rf_gini.fit(X_train, y_train)
y_pred_gini = rf_gini.predict(X_test)
print("Classification with Gini:", accuracy_score(y_test, y_pred_gini))

# Random Forest with Entropy
rf_entropy = RandomForestClassifier(n_estimators=100, criterion='entropy', random_state=42)
rf_entropy.fit(X_train, y_train)
y_pred_entropy = rf_entropy.predict(X_test)
print("Classification with Entropy:", accuracy_score(y_test, y_pred_entropy))
Classification with Gini: 1.0
Classification with Entropy: 1.0

Regression Example (Squared Error vs Absolute Error)#

# Import Libraries
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error

# Create Sample Regression Data
np.random.seed(42)
X = np.sort(np.random.rand(100,1) * 10, axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.3, X.shape[0])

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Random Forest with Squared Error (Variance Reduction)
rf_squared = RandomForestRegressor(n_estimators=100, criterion='squared_error', random_state=42)
rf_squared.fit(X_train, y_train)
y_pred_squared = rf_squared.predict(X_test)

# Random Forest with Absolute Error (L1)
rf_absolute = RandomForestRegressor(n_estimators=100, criterion='absolute_error', random_state=42)
rf_absolute.fit(X_train, y_train)
y_pred_absolute = rf_absolute.predict(X_test)

# Metrics
def print_metrics(name, y_true, y_pred):
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    r2 = r2_score(y_true, y_pred)
    print(f"{name} -> R²: {r2:.2f}, RMSE: {rmse:.2f}")

print_metrics("Squared Error", y_test, y_pred_squared)
print_metrics("Absolute Error", y_test, y_pred_absolute)
Squared Error -> R²: 0.78, RMSE: 0.31
Absolute Error -> R²: 0.79, RMSE: 0.30