Cost Functions

Cost Functions #

A cost function measures how well a regression line fits the data. It calculates the error (difference between predicted and actual values). The goal is to find parameters (\(\theta_0, \theta_1, ...\)) that minimize this cost.

Mean Squared Error (MSE)#

\[ J(\theta) = \frac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 \]

Most common cost function for linear regression.
Squares the errors → penalizes large errors heavily.
Smooth and differentiable → great for gradient descent.

✅ Pros:

Convex function → guarantees global minimum.
Emphasizes large errors (good for minimizing outliers’ impact in general trend fitting).

❌ Cons:

Sensitive to outliers → a single extreme point can distort results.

Mean Absolute Error (MAE)#

\[ J(\theta) = \frac{1}{m} \sum_{i=1}^m |h_\theta(x^{(i)}) - y^{(i)}| \]

Takes absolute differences instead of squaring.
Errors are treated equally (linear penalty).

✅ Pros:

Robust to outliers (doesn’t blow up error).
Simple interpretation (average error in same units as data).

❌ Cons:

Not differentiable at zero (gradient descent is trickier, though solvable with subgradients).
Optimization can be slower.

Huber Loss (Combination of MSE and MAE)#

\[\begin{split} J(\theta) = \begin{cases} \frac{1}{2}(h_\theta(x) - y)^2 & \text{if } |h_\theta(x) - y| \leq \delta \\ \delta \cdot |h_\theta(x) - y| - \frac{1}{2}\delta^2 & \text{otherwise} \end{cases} \end{split}\]

Uses MSE for small errors (smooth) and MAE for large errors (robust to outliers).

✅ Pros:

Best of both worlds: smooth optimization + outlier resistance.
Widely used in robust regression.

❌ Cons:

Need to choose hyperparameter \(\delta\).

Log-Cosh Loss #

\[ J(\theta) = \frac{1}{m} \sum_{i=1}^m \log \left( \cosh(h_\theta(x^{(i)}) - y^{(i)}) \right) \]

Behaves like MSE for small errors and like MAE for large errors (but smoother).

✅ Pros:

Smooth everywhere (better for gradient descent).
Less sensitive to outliers than MSE.

❌ Cons:

Computationally more expensive than MSE.

Quantile Loss (Pinball Loss)#

\[\begin{split} J(\theta) = \sum_{i=1}^m \begin{cases} q \cdot (y^{(i)} - h_\theta(x^{(i)})) & \text{if } y^{(i)} \geq h_\theta(x^{(i)}) \\ (1-q) \cdot (h_\theta(x^{(i)}) - y^{(i)}) & \text{otherwise} \end{cases} \end{split}\]

Used in quantile regression (instead of predicting mean, predicts quantiles like median, 90th percentile).
For \(q=0.5\), it reduces to MAE (median regression).

✅ Pros:

Useful for risk-sensitive domains (finance, forecasting).

❌ Cons:

More complex interpretation.

Impact of Cost Functions #

MSE → great for general regression, but heavily penalizes outliers.
MAE → robust to outliers but optimization harder.
Huber / Log-Cosh → balance between robustness and efficiency.
Quantile Loss → gives flexibility beyond “average prediction”.

👉 In standard linear regression, we almost always use MSE because:

It’s mathematically elegant (derivative gives closed-form solution).
Convex → guarantees global minimum.
Easy to optimize with gradient descent or normal equation.

import numpy as np
import matplotlib.pyplot as plt

# Sample dataset (simple linear data)
X = np.array([1, 2, 3])
y = np.array([1, 2, 3])
m = len(y)

# Hypothesis function h(x) = theta1 * x (assuming theta0 = 0 for simplicity)
def hypothesis(theta1, X):
    return theta1 * X

# Cost function J(theta1)
def cost(theta1, X, y):
    return (1/(2*m)) * np.sum((hypothesis(theta1, X) - y) ** 2)

# Generate values of theta1 to test
theta1_vals = np.linspace(-1, 3, 100)
J_vals = [cost(t, X, y) for t in theta1_vals]

# Plot the cost function curve
plt.figure(figsize=(10,5))
plt.plot(theta1_vals, J_vals, color='blue')
plt.xlabel(r"$\theta_1$ (Slope)")
plt.ylabel(r"$J(\theta_1)$ (Cost Function)")
plt.title("Cost Function Curve (MSE vs Slope)")
plt.axvline(x=1, color='red', linestyle="--", label="Optimal slope θ1=1")
plt.legend()
plt.grid(True)
plt.show()

../../../_images/2b5a4f7b95e414233fb4e18bac2fbde7ab48057ed1f9a63aad51cad9c6a35509.png

Cost Functions

Contents

Cost Functions#

Mean Squared Error (MSE)#

Mean Absolute Error (MAE)#

Huber Loss (Combination of MSE and MAE)#

Log-Cosh Loss#

Quantile Loss (Pinball Loss)#

Impact of Cost Functions#

Cost Functions #

Log-Cosh Loss #

Impact of Cost Functions #