Evaluation Metrics#

accuracy_score#

Definition:

  • Measures the overall proportion of correct predictions.

\[ \text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total predictions}} \]

Example:

from sklearn.metrics import accuracy_score

y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]

accuracy_score(y_true, y_pred)  # Output: 0.8

Interpretation:

  • 80% of predictions are correct.

  • Limitation: For imbalanced datasets, accuracy can be misleading.

    • Example: If 90% of samples are class 0, predicting everything as 0 gives 90% accuracy, but the minority class is completely ignored.


confusion_matrix#

Definition:

  • Shows the count of true vs predicted labels.

  • For binary classification:

Predicted 0

Predicted 1

True 0 (TN)

True Negative

False Positive

True 1 (TP)

False Negative

True Positive

Example:

from sklearn.metrics import confusion_matrix

y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]

confusion_matrix(y_true, y_pred)
# Output:
# [[2 0]
#  [1 2]]

Interpretation:

  • TN = 2 → True 0 predicted correctly

  • FP = 0 → No 0 predicted incorrectly

  • FN = 1 → One 1 predicted incorrectly

  • TP = 2 → Two 1 predicted correctly

Why it matters:

  • Helps visualize errors by class

  • Essential for imbalanced datasets, as accuracy alone may be misleading.


classification_report#

Definition:

  • Provides precision, recall, F1-score, and support for each class.

Example:

from sklearn.metrics import classification_report

y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]

print(classification_report(y_true, y_pred))

Output:

              precision    recall  f1-score   support

           0       0.67      1.00      0.80         2
           1       1.00      0.67      0.80         3

    accuracy                           0.80         5
   macro avg       0.83      0.83      0.80         5
weighted avg       0.87      0.80      0.80         5

Interpretation:

Metric

Meaning

Precision

Out of all predicted as class X, how many were actually X.
High precision → few false positives.

Recall

Out of all actual class X samples, how many were correctly predicted.
High recall → few false negatives.

F1-score

Harmonic mean of precision and recall. Balances the two.

Support

Number of true samples for each class.

Imbalanced datasets:

  • F1-score is more informative than accuracy.

  • Weighted or macro averages help summarize overall performance.


Summary Table for Quick Reference#

Metric

Best Used For

Interpretation in Imbalanced Data

accuracy_score

Overall correctness

Can be misleading if classes are imbalanced

confusion_matrix

Counts of TP, TN, FP, FN

Shows where the model is failing

classification_report

Precision, Recall, F1-score per class

Gives balanced evaluation across classes