Performance Metrics

Performance Metrics#

1. Accuracy#

\[ \text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total predictions}} \]

Measures overall correctness.
Works well when classes are balanced.
Misleading for imbalanced datasets.
- Example: If 95% of emails are “ham”, predicting “ham” always gives 95% accuracy but is useless.

2. Confusion Matrix#

A table comparing predicted vs actual classes.

For binary classification:

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

From this, we compute other metrics.

3. Precision#

\[ \text{Precision} = \frac{TP}{TP + FP} \]

Of all items predicted positive, how many are truly positive?
Good when false positives are costly (e.g., classifying ham as spam).

4. Recall (Sensitivity, True Positive Rate)#

\[ \text{Recall} = \frac{TP}{TP + FN} \]

Of all true positives, how many did we correctly find?
Good when false negatives are costly (e.g., missing a cancer diagnosis).

5. F1 Score#

\[ F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \]

Harmonic mean of precision and recall.
Useful for imbalanced data.

6. ROC Curve & AUC#

ROC Curve → plots True Positive Rate (Recall) vs False Positive Rate (FP / (FP+TN)) for different probability thresholds.
AUC (Area Under Curve) → measures how well the model separates classes.
- AUC = 1 → perfect.
- AUC = 0.5 → random guessing.

Naïve Bayes outputs probabilities (\(P(y|x)\)), so you can directly use ROC-AUC.

7. Log Loss (Cross-Entropy Loss)#

\[ \text{LogLoss} = -\frac{1}{m} \sum_{j=1}^m \log P(y^{(j)} | x^{(j)}) \]

Evaluates the probabilistic predictions, not just labels.
Penalizes confident but wrong predictions.
Useful when probability calibration matters (e.g., medical risk prediction).

8. Calibration Metrics#

Naïve Bayes often produces poorly calibrated probabilities (too extreme, close to 0 or 1).

Tools like calibration curves or Brier score check if predicted probabilities match actual outcomes.

Summary

For Naïve Bayes classification, use:

Accuracy → if classes balanced.
Precision, Recall, F1 → if data imbalanced.
ROC-AUC → for probability-based evaluation.
Log Loss → if probability quality matters.
Calibration → if decision thresholds rely on well-calibrated probabilities.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import (
    accuracy_score, confusion_matrix, classification_report,
    roc_curve, auc, log_loss
)
from sklearn.datasets import make_classification

# Generate synthetic binary classification dataset
X, y = make_classification(
    n_samples=500, n_features=10, n_informative=5, n_redundant=2,
    n_classes=2, weights=[0.7, 0.3], random_state=42
)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Train Naive Bayes
nb = GaussianNB()
nb.fit(X_train, y_train)

# Predictions
y_pred = nb.predict(X_test)
y_proba = nb.predict_proba(X_test)[:, 1]

# Metrics
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=["Class 0", "Class 1"])
logloss_val = log_loss(y_test, y_proba)

# ROC-AUC
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
roc_auc = auc(fpr, tpr)

(acc, cm, report, logloss_val, roc_auc)

(0.8866666666666667,
 array([[100,   4],
        [ 13,  33]]),
 '              precision    recall  f1-score   support\n\n     Class 0       0.88      0.96      0.92       104\n     Class 1       0.89      0.72      0.80        46\n\n    accuracy                           0.89       150\n   macro avg       0.89      0.84      0.86       150\nweighted avg       0.89      0.89      0.88       150\n',
 0.4033059439714829,
 0.8760451505016723)

Results#

Accuracy: 0.887 (~89%)
Confusion Matrix:
```
[[100   4]
 [ 13  33]]
```
- True Negatives = 100
- False Positives = 4
- False Negatives = 13
- True Positives = 33

Classification Report:

            precision    recall  f1-score   support

   Class 0       0.88      0.96      0.92       104
   Class 1       0.89      0.72      0.80        46

  accuracy                           0.89       150
 macro avg       0.89      0.84      0.86       150

weighted avg 0.89 0.89 0.88 150

- **Log Loss**: `0.403` (lower is better; penalizes wrong confident predictions)  
- **ROC-AUC**: `0.876` (good separation; 1.0 = perfect, 0.5 = random)  

---

These metrics show:
- Model is strong overall (~89% accuracy).  
- Slight imbalance in recall → Class 1 (minority) has lower recall (0.72), meaning some positives are missed.  
- ROC-AUC confirms good probability separation.  

---

⚡ Do you want me to also **plot ROC curve + confusion matrix heatmap** for clearer visualization?

Macro Average (`macro_avg`)#

Definition: Takes the arithmetic mean of the metric across all classes without considering class imbalance.
Formula for precision (example):

\[ \text{Precision}_{macro} = \frac{1}{C} \sum_{i=1}^{C} \text{Precision}_i \]

where \(C\) = number of classes.
Effect:
- Treats all classes equally.
- Useful when you want to evaluate performance per class fairly, even if one class has fewer samples.

👉 In your Naïve Bayes example:

macro avg precision = 0.89
macro avg recall = 0.84
Shows average performance across Class 0 and Class 1, equally weighted.

Weighted Average (`weighted_avg`)#

Definition: Takes the support (number of true samples per class) into account while averaging.
Formula for precision (example):

\[ \text{Precision}_{weighted} = \frac{\sum_{i=1}^{C} ( \text{Support}_i \times \text{Precision}_i )}{\sum_{i=1}^{C} \text{Support}_i} \]
Effect:
- Gives more importance to larger classes.
- If dataset is imbalanced, the metric will be skewed toward majority class.