Cost Function#
Unlike linear regression, we cannot use Mean Squared Error (MSE) directly because of the non-linear sigmoid output—it would lead to a non-convex cost function, which is hard to optimize.
Here’s a breakdown:
Sigmoid Function#
First, logistic regression outputs probabilities using the sigmoid function:
Where:
\(\hat{y}\) = predicted probability that \(y = 1\)
\(\theta\) = model parameters
\(x\) = input features
Likelihood Function#
Logistic regression is based on Maximum Likelihood Estimation (MLE).
The likelihood is the probability of observing the given data with parameters \(\theta\):
For binary labels \(y \in \{0,1\}\):
So the likelihood becomes:
Log-Likelihood#
We usually take the log of the likelihood to simplify calculations:
Cost Function (Negative Log-Likelihood / Log Loss)#
To minimize a function, we take negative log-likelihood:
This is the primary cost function used in logistic regression.
Intuition:
If the model predicts correctly, \(\hat{y}\) is close to \(y\), so log loss is small.
If the model is confident but wrong, log loss is very large.
Variants / Regularized Cost Functions#
To prevent overfitting, we add regularization:
L2 Regularization (Ridge)
L1 Regularization (Lasso)
\(\lambda\) = regularization parameter
L2 penalizes large weights
L1 encourages sparsity (many weights become 0)
Alternative (Less Common) Cost Functions#
Mean Squared Error (MSE): Sometimes used, but not preferred because it makes the cost function non-convex for logistic regression.
Hinge Loss: Used in SVMs, not typical for logistic regression.
✅ Summary Table
Cost Function |
Formula |
Notes |
||
|---|---|---|---|---|
Log Loss (Binary) |
\(-\frac{1}{m} \sum [y \log \hat{y} + (1-y) \log (1-\hat{y})]\) |
Standard cost for logistic regression |
||
L2 Regularized |
Log Loss + \(\frac{\lambda}{2m} \sum \theta_j^2\) |
Penalizes large weights, prevents overfitting |
||
L1 Regularized |
Log Loss + ( \frac{\lambda}{m} \sum |
\theta_j |
) |
Encourages sparse models |
MSE (Not Recommended) |
\(\frac{1}{2m} \sum (\hat{y}-y)^2\) |
Non-convex for logistic regression, rarely used |