Evaluation Metrics

Evaluation Metrics #

`accuracy_score`#

Definition:

Measures the overall proportion of correct predictions.

\[ \text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total predictions}} \]

Example:

from sklearn.metrics import accuracy_score

y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]

accuracy_score(y_true, y_pred)  # Output: 0.8

Interpretation:

80% of predictions are correct.
Limitation: For imbalanced datasets, accuracy can be misleading.
- Example: If 90% of samples are class 0, predicting everything as 0 gives 90% accuracy, but the minority class is completely ignored.

`confusion_matrix`#

Definition:

Shows the count of true vs predicted labels.
For binary classification:

	Predicted 0	Predicted 1
True 0 (TN)	True Negative	False Positive
True 1 (TP)	False Negative	True Positive

Example:

from sklearn.metrics import confusion_matrix

y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]

confusion_matrix(y_true, y_pred)
# Output:
# [[2 0]
#  [1 2]]

Interpretation:

TN = 2 → True 0 predicted correctly
FP = 0 → No 0 predicted incorrectly
FN = 1 → One 1 predicted incorrectly
TP = 2 → Two 1 predicted correctly

Why it matters:

Helps visualize errors by class
Essential for imbalanced datasets, as accuracy alone may be misleading.

`classification_report`#

Definition:

Provides precision, recall, F1-score, and support for each class.

Example:

from sklearn.metrics import classification_report

y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]

print(classification_report(y_true, y_pred))

Output:

              precision    recall  f1-score   support

           0       0.67      1.00      0.80         2
           1       1.00      0.67      0.80         3

    accuracy                           0.80         5
   macro avg       0.83      0.83      0.80         5
weighted avg       0.87      0.80      0.80         5

Interpretation:

Metric	Meaning
Precision	Out of all predicted as class X, how many were actually X. High precision → few false positives.
Recall	Out of all actual class X samples, how many were correctly predicted. High recall → few false negatives.
F1-score	Harmonic mean of precision and recall. Balances the two.
Support	Number of true samples for each class.

Imbalanced datasets:

F1-score is more informative than accuracy.
Weighted or macro averages help summarize overall performance.

Summary Table for Quick Reference#

Metric	Best Used For	Interpretation in Imbalanced Data
`accuracy_score`	Overall correctness	Can be misleading if classes are imbalanced
`confusion_matrix`	Counts of TP, TN, FP, FN	Shows where the model is failing
`classification_report`	Precision, Recall, F1-score per class	Gives balanced evaluation across classes

Evaluation Metrics

Contents

Evaluation Metrics#

accuracy_score#

confusion_matrix#

classification_report#

Summary Table for Quick Reference#

Evaluation Metrics #

`accuracy_score`#

`confusion_matrix`#

`classification_report`#