Performance Metrics

Performance Metrics #

AdaBoost Classification#

When using AdaBoost as a classifier, you evaluate how well the model separates classes. The most common metrics are:

✅ Accuracy #

Proportion of correctly classified samples.
Good for balanced datasets, but misleading for imbalanced classes.

✅ Precision, Recall, and F1-Score #

Precision: Among the predicted positives, how many are correct?
Recall (Sensitivity): Among the actual positives, how many are captured?
F1-score: Harmonic mean of precision & recall → balances both.
Useful when dealing with imbalanced datasets.

✅ ROC-AUC (Receiver Operating Characteristic – Area Under Curve)#

Plots True Positive Rate vs False Positive Rate.
AUC close to 1 → strong classifier.
Threshold-independent metric.

✅ Log Loss (Cross-Entropy Loss)#

Measures the probabilistic confidence of predictions.
Lower log loss = better probability calibration.
More informative than accuracy because it penalizes “overconfident wrong predictions.”

AdaBoost Regression#

When AdaBoost is used with regression trees, you measure how well it predicts continuous values:

Mean Squared Error (MSE)#

\[ MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2 \]

Penalizes large errors more heavily.

Root Mean Squared Error (RMSE)#

\[ RMSE = \sqrt{MSE} \]

Same units as the target variable → more interpretable.

Mean Absolute Error (MAE)#

\[ MAE = \frac{1}{n} \sum |y_i - \hat{y}_i| \]

Less sensitive to outliers than MSE.

R² Score (Coefficient of Determination)#

\[ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} \]

Measures proportion of variance explained by the model.
\(R^2 = 1\): perfect predictions,
\(R^2 = 0\): no better than mean prediction.

Median Absolute Error #

Median of absolute residuals.
Very robust against outliers compared to MSE/RMSE.

Key Insights

For classification, use Accuracy + Precision/Recall/F1 + AUC depending on dataset balance.
For regression, rely on MSE, MAE, RMSE, and R² for error magnitude and explanatory power.
Since AdaBoost can overfit noisy datasets, monitoring multiple metrics is crucial.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification, make_regression
from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, log_loss, confusion_matrix, classification_report,
    mean_squared_error, mean_absolute_error, r2_score, median_absolute_error
)
from sklearn.model_selection import train_test_split

# -------------------------
# PART 1: CLASSIFICATION
# -------------------------
# Create synthetic classification dataset
X_cls, y_cls = make_classification(
    n_samples=500, n_features=10, n_informative=5, n_redundant=2,
    n_classes=2, weights=[0.6, 0.4], random_state=42
)

# Split data
Xc_train, Xc_test, yc_train, yc_test = train_test_split(X_cls, y_cls, test_size=0.3, random_state=42)

# Train AdaBoost Classifier
ada_cls = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=100, learning_rate=0.5, random_state=42
)
ada_cls.fit(Xc_train, yc_train)

# Predictions
y_pred_cls = ada_cls.predict(Xc_test)
y_proba_cls = ada_cls.predict_proba(Xc_test)[:,1]

# Classification metrics
cls_metrics = {
    "Accuracy": accuracy_score(yc_test, y_pred_cls),
    "Precision": precision_score(yc_test, y_pred_cls),
    "Recall": recall_score(yc_test, y_pred_cls),
    "F1-Score": f1_score(yc_test, y_pred_cls),
    "ROC-AUC": roc_auc_score(yc_test, y_proba_cls),
    "Log Loss": log_loss(yc_test, y_proba_cls)
}

# -------------------------
# PART 2: REGRESSION
# -------------------------
# Create synthetic regression dataset
X_reg, y_reg = make_regression(
    n_samples=500, n_features=10, noise=10.0, random_state=42
)

# Split data
Xr_train, Xr_test, yr_train, yr_test = train_test_split(X_reg, y_reg, test_size=0.3, random_state=42)

# Train AdaBoost Regressor
ada_reg = AdaBoostRegressor(
    estimator=DecisionTreeRegressor(max_depth=3),
    n_estimators=100, learning_rate=0.5, random_state=42
)
ada_reg.fit(Xr_train, yr_train)

# Predictions
y_pred_reg = ada_reg.predict(Xr_test)

# Regression metrics
reg_metrics = {
    "MSE": mean_squared_error(yr_test, y_pred_reg),
    "RMSE": np.sqrt(mean_squared_error(yr_test, y_pred_reg)),
    "MAE": mean_absolute_error(yr_test, y_pred_reg),
    "R2 Score": r2_score(yr_test, y_pred_reg),
    "Median Absolute Error": median_absolute_error(yr_test, y_pred_reg)
}

import pandas as pd
cls_df = pd.DataFrame([cls_metrics], index=["AdaBoost Classification"])
reg_df = pd.DataFrame([reg_metrics], index=["AdaBoost Regression"])
(cls_df, reg_df)

(                         Accuracy  Precision    Recall  F1-Score   ROC-AUC  \
 AdaBoost Classification      0.86   0.907407  0.753846  0.823529  0.956018   
 
                          Log Loss  
 AdaBoost Classification  0.499989  ,
                              MSE       RMSE       MAE  R2 Score  \
 AdaBoost Regression  5842.940439  76.439129  57.97347  0.713638   
 
                      Median Absolute Error  
 AdaBoost Regression              46.093553  )

Performance Metrics

Contents