Cost Function#
A cost function (also called splitting criterion) measures how “good” a split is at a node in a decision tree.
Random Forest builds many decision trees, and each tree uses a cost function to decide where to split the data.
The goal is to minimize prediction error in the leaves.
2. Cost Functions for Regression#
For Random Forest Regressor, the commonly used criteria are:
Criterion |
What it Measures |
Formula / Intuition |
|---|---|---|
|
Variance reduction |
Measures how much splitting reduces variance of target values in child nodes. The split that reduces variance the most is chosen. |
|
Mean Absolute Error (L1) |
Measures sum of absolute differences between target values and mean in child nodes. Less sensitive to outliers than squared error. |
|
Poisson deviance |
Used for count data, assumes a Poisson distribution of targets. |
Variance Reduction Intuition:
Suppose a node contains target values
[10, 12, 15, 14].Splitting them into
[10, 12]and[15, 14]reduces variance in child nodes.The split that minimizes the average variance across children is chosen.
Mathematically:
The larger the reduction, the better the split.
3. Cost Functions for Classification#
For Random Forest Classifier, the commonly used criteria are:
Criterion |
What it Measures |
Formula / Intuition |
|---|---|---|
|
Gini Impurity |
Measures how often a randomly chosen sample would be misclassified if labeled according to the node’s class distribution. Lower is better. |
|
Information Gain |
Measures uncertainty of class labels. Split that maximizes information gain (reduces entropy) is chosen. |
|
Logarithmic loss |
Measures probability-based error. More precise for probabilistic classification. |
Gini Impurity Formula:
\(p_i\) = proportion of class \(i\) in the node
Gini = 0 → node is pure (all samples same class)
Gini = max → node is mixed evenly
Entropy Formula:
Measures uncertainty
Split that reduces entropy the most → preferred
4. Intuition Behind Cost Functions#
Regression (Variance Reduction):
Split the node so that children are as homogeneous as possible in target values.
Classification (Gini/Entropy):
Split the node so that children are as pure as possible, meaning samples in a child node mostly belong to one class.
Random Forest Aggregates Trees:
Even if one tree chooses a suboptimal split (high cost), averaging across many trees reduces the impact of bad splits.
Key Points
Random Forest does not have a global cost function; each tree optimizes locally at each node.
Aggregation of predictions across trees handles variance and bias, making RF robust.
Choice of criterion affects:
Model performance (sometimes small differences)
Sensitivity to outliers (
squared_errorvsabsolute_error)Training speed (
giniis faster thanentropy)
Classification Example (Gini vs Entropy)#
# Import Libraries
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Random Forest with Gini
rf_gini = RandomForestClassifier(n_estimators=100, criterion='gini', random_state=42)
rf_gini.fit(X_train, y_train)
y_pred_gini = rf_gini.predict(X_test)
print("Classification with Gini:", accuracy_score(y_test, y_pred_gini))
# Random Forest with Entropy
rf_entropy = RandomForestClassifier(n_estimators=100, criterion='entropy', random_state=42)
rf_entropy.fit(X_train, y_train)
y_pred_entropy = rf_entropy.predict(X_test)
print("Classification with Entropy:", accuracy_score(y_test, y_pred_entropy))
Classification with Gini: 1.0
Classification with Entropy: 1.0
Regression Example (Squared Error vs Absolute Error)#
# Import Libraries
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error
# Create Sample Regression Data
np.random.seed(42)
X = np.sort(np.random.rand(100,1) * 10, axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.3, X.shape[0])
# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Random Forest with Squared Error (Variance Reduction)
rf_squared = RandomForestRegressor(n_estimators=100, criterion='squared_error', random_state=42)
rf_squared.fit(X_train, y_train)
y_pred_squared = rf_squared.predict(X_test)
# Random Forest with Absolute Error (L1)
rf_absolute = RandomForestRegressor(n_estimators=100, criterion='absolute_error', random_state=42)
rf_absolute.fit(X_train, y_train)
y_pred_absolute = rf_absolute.predict(X_test)
# Metrics
def print_metrics(name, y_true, y_pred):
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
r2 = r2_score(y_true, y_pred)
print(f"{name} -> R²: {r2:.2f}, RMSE: {rmse:.2f}")
print_metrics("Squared Error", y_test, y_pred_squared)
print_metrics("Absolute Error", y_test, y_pred_absolute)
Squared Error -> R²: 0.78, RMSE: 0.31
Absolute Error -> R²: 0.79, RMSE: 0.30