Evaluation Metrics#
accuracy_score#
Definition:
Measures the overall proportion of correct predictions.
Example:
from sklearn.metrics import accuracy_score
y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]
accuracy_score(y_true, y_pred) # Output: 0.8
Interpretation:
80% of predictions are correct.
Limitation: For imbalanced datasets, accuracy can be misleading.
Example: If 90% of samples are class 0, predicting everything as 0 gives 90% accuracy, but the minority class is completely ignored.
confusion_matrix#
Definition:
Shows the count of true vs predicted labels.
For binary classification:
Predicted 0 |
Predicted 1 |
|
|---|---|---|
True 0 (TN) |
True Negative |
False Positive |
True 1 (TP) |
False Negative |
True Positive |
Example:
from sklearn.metrics import confusion_matrix
y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]
confusion_matrix(y_true, y_pred)
# Output:
# [[2 0]
# [1 2]]
Interpretation:
TN = 2 → True 0 predicted correctly
FP = 0 → No 0 predicted incorrectly
FN = 1 → One 1 predicted incorrectly
TP = 2 → Two 1 predicted correctly
Why it matters:
Helps visualize errors by class
Essential for imbalanced datasets, as accuracy alone may be misleading.
classification_report#
Definition:
Provides precision, recall, F1-score, and support for each class.
Example:
from sklearn.metrics import classification_report
y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]
print(classification_report(y_true, y_pred))
Output:
precision recall f1-score support
0 0.67 1.00 0.80 2
1 1.00 0.67 0.80 3
accuracy 0.80 5
macro avg 0.83 0.83 0.80 5
weighted avg 0.87 0.80 0.80 5
Interpretation:
Metric |
Meaning |
|---|---|
Precision |
Out of all predicted as class X, how many were actually X. |
Recall |
Out of all actual class X samples, how many were correctly predicted. |
F1-score |
Harmonic mean of precision and recall. Balances the two. |
Support |
Number of true samples for each class. |
Imbalanced datasets:
F1-score is more informative than accuracy.
Weighted or macro averages help summarize overall performance.
Summary Table for Quick Reference#
Metric |
Best Used For |
Interpretation in Imbalanced Data |
|---|---|---|
|
Overall correctness |
Can be misleading if classes are imbalanced |
|
Counts of TP, TN, FP, FN |
Shows where the model is failing |
|
Precision, Recall, F1-score per class |
Gives balanced evaluation across classes |