Cost Functions#
1. Maximum Likelihood Estimation (MLE) – Training Objective#
Naïve Bayes learns probabilities \(P(y)\) and \(P(x_i|y)\) from data by maximizing the likelihood of the training set:
where:
\(m\) = number of training samples
\(y^{(j)}\) = class label of sample \(j\)
\(x^{(j)}\) = features of sample \(j\)
\(\theta\) = parameters (priors + likelihoods).
⚡ In practice, we maximize the log-likelihood (to avoid underflow and simplify multiplication into summation):
👉 So the implicit cost function is:
This is essentially negative log-likelihood (NLL), also called cross-entropy loss.
2. Cross-Entropy / Log Loss – Evaluation#
When evaluating probabilistic classifiers like Naïve Bayes, we often use log loss:
where:
\(k\) = number of classes
\(\mathbf{1}\) = indicator function (1 if true class = \(c\), else 0).
👉 This penalizes wrong predictions more when the model is confident but incorrect.
3. Zero-One Loss – Simpler alternative#
Sometimes for classification, we also look at 0-1 loss (not probabilistic, just accuracy-based):
This is basically the misclassification rate.
Summary
Training → Naïve Bayes parameters are estimated via maximum likelihood, which implicitly minimizes negative log-likelihood (NLL).
Evaluation → Common cost functions:
Log Loss (cross-entropy) → best for probabilistic performance.
0-1 Loss (error rate) → best for accuracy comparison.