Evaluation Metrics

Contents

Evaluation Metrics #

Confusion Matrix (Foundation of Metrics)#

For a binary classifier, the confusion matrix is:

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

From this table, all evaluation metrics are derived.

Accuracy #

\[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]

Measures overall correctness of predictions.
Good when classes are balanced.
Can be misleading if dataset is imbalanced.

Precision, Recall, and F1-Score #

✅ Precision #

\[ \text{Precision} = \frac{TP}{TP + FP} \]

Out of predicted positives, how many are actually positive?
High precision → few false alarms.

Recall (Sensitivity / TPR)#

\[ \text{Recall} = \frac{TP}{TP + FN} \]

Out of actual positives, how many did we catch?
High recall → few missed detections.

F1-Score #

\[ F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \]

Harmonic mean of precision & recall.
Useful when dataset is imbalanced and we need a balance.

ROC Curve & AUC #

ROC Curve plots Recall (TPR) vs False Positive Rate (FPR).
AUC (Area Under Curve):
- Closer to 1 → better classifier.
- 0.5 → random guessing.

For SVC, use decision_function or predict_proba (probability=True) to calculate these.

Metrics for Multi-class SVC #

Since SVC uses One-vs-Rest (OvR) or One-vs-One (OvO):

Macro average → averages metric across all classes equally.
Weighted average → averages metric weighted by class frequency.

These are shown in classification_report in scikit-learn.

Other Advanced Metrics #

Balanced Accuracy → adjusts accuracy for imbalanced datasets.
Cohen’s Kappa → measures agreement beyond chance.
Matthews Correlation Coefficient (MCC): robust for imbalanced data.

Summary Table

Metric	Meaning	Best Use
Accuracy	Overall correctness	Balanced data
Precision	Correct positive predictions / All predicted positives	When false alarms costly
Recall	Correct positive predictions / All actual positives	When missing positives costly
F1-Score	Balance of precision & recall	Imbalanced data
ROC-AUC	Ranking ability	Threshold selection
MCC	Correlation between predictions & truth	Imbalanced data