Cost Function

Cost Function #

A cost function (also called splitting criterion) measures how “good” a split is at a node in a decision tree.
Random Forest builds many decision trees, and each tree uses a cost function to decide where to split the data.
The goal is to minimize prediction error in the leaves.

2. Cost Functions for Regression#

For Random Forest Regressor, the commonly used criteria are:

Criterion	What it Measures	Formula / Intuition
`squared_error` (default)	Variance reduction	Measures how much splitting reduces variance of target values in child nodes. The split that reduces variance the most is chosen.
`absolute_error`	Mean Absolute Error (L1)	Measures sum of absolute differences between target values and mean in child nodes. Less sensitive to outliers than squared error.
`poisson`	Poisson deviance	Used for count data, assumes a Poisson distribution of targets.

Variance Reduction Intuition:

Suppose a node contains target values [10, 12, 15, 14].
Splitting them into [10, 12] and [15, 14] reduces variance in child nodes.
The split that minimizes the average variance across children is chosen.

Mathematically:

\[ \text{Cost (variance)} = \text{Var(parent)} - \frac{n_\text{left}}{n_\text{parent}} \text{Var(left)} - \frac{n_\text{right}}{n_\text{parent}} \text{Var(right)} \]

The larger the reduction, the better the split.

3. Cost Functions for Classification#

For Random Forest Classifier, the commonly used criteria are:

Criterion	What it Measures	Formula / Intuition
`gini`	Gini Impurity	Measures how often a randomly chosen sample would be misclassified if labeled according to the node’s class distribution. Lower is better.
`entropy`	Information Gain	Measures uncertainty of class labels. Split that maximizes information gain (reduces entropy) is chosen.
`log_loss`	Logarithmic loss	Measures probability-based error. More precise for probabilistic classification.

Gini Impurity Formula:

\[ Gini = 1 - \sum_{i=1}^{C} p_i^2 \]

\(p_i\) = proportion of class \(i\) in the node
Gini = 0 → node is pure (all samples same class)
Gini = max → node is mixed evenly

Entropy Formula:

\[ Entropy = -\sum_{i=1}^{C} p_i \log_2(p_i) \]

Measures uncertainty
Split that reduces entropy the most → preferred

4. Intuition Behind Cost Functions#

Regression (Variance Reduction):
- Split the node so that children are as homogeneous as possible in target values.
Classification (Gini/Entropy):
- Split the node so that children are as pure as possible, meaning samples in a child node mostly belong to one class.
Random Forest Aggregates Trees:
- Even if one tree chooses a suboptimal split (high cost), averaging across many trees reduces the impact of bad splits.

Key Points

Random Forest does not have a global cost function; each tree optimizes locally at each node.
Aggregation of predictions across trees handles variance and bias, making RF robust.
Choice of criterion affects:
- Model performance (sometimes small differences)
- Sensitivity to outliers (squared_error vs absolute_error)
- Training speed (gini is faster than entropy)

Classification Example (Gini vs Entropy)#

# Import Libraries
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Random Forest with Gini
rf_gini = RandomForestClassifier(n_estimators=100, criterion='gini', random_state=42)
rf_gini.fit(X_train, y_train)
y_pred_gini = rf_gini.predict(X_test)
print("Classification with Gini:", accuracy_score(y_test, y_pred_gini))

# Random Forest with Entropy
rf_entropy = RandomForestClassifier(n_estimators=100, criterion='entropy', random_state=42)
rf_entropy.fit(X_train, y_train)
y_pred_entropy = rf_entropy.predict(X_test)
print("Classification with Entropy:", accuracy_score(y_test, y_pred_entropy))

Classification with Gini: 1.0
Classification with Entropy: 1.0

Regression Example (Squared Error vs Absolute Error)#

# Import Libraries
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error

# Create Sample Regression Data
np.random.seed(42)
X = np.sort(np.random.rand(100,1) * 10, axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.3, X.shape[0])

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Random Forest with Squared Error (Variance Reduction)
rf_squared = RandomForestRegressor(n_estimators=100, criterion='squared_error', random_state=42)
rf_squared.fit(X_train, y_train)
y_pred_squared = rf_squared.predict(X_test)

# Random Forest with Absolute Error (L1)
rf_absolute = RandomForestRegressor(n_estimators=100, criterion='absolute_error', random_state=42)
rf_absolute.fit(X_train, y_train)
y_pred_absolute = rf_absolute.predict(X_test)

# Metrics
def print_metrics(name, y_true, y_pred):
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    r2 = r2_score(y_true, y_pred)
    print(f"{name} -> R²: {r2:.2f}, RMSE: {rmse:.2f}")

print_metrics("Squared Error", y_test, y_pred_squared)
print_metrics("Absolute Error", y_test, y_pred_absolute)

Squared Error -> R²: 0.78, RMSE: 0.31
Absolute Error -> R²: 0.79, RMSE: 0.30