Performance Metrics#
AdaBoost Classification#
When using AdaBoost as a classifier, you evaluate how well the model separates classes. The most common metrics are:
✅ Accuracy#
Proportion of correctly classified samples.
Good for balanced datasets, but misleading for imbalanced classes.
✅ Precision, Recall, and F1-Score#
Precision: Among the predicted positives, how many are correct?
Recall (Sensitivity): Among the actual positives, how many are captured?
F1-score: Harmonic mean of precision & recall → balances both.
Useful when dealing with imbalanced datasets.
✅ ROC-AUC (Receiver Operating Characteristic – Area Under Curve)#
Plots True Positive Rate vs False Positive Rate.
AUC close to 1 → strong classifier.
Threshold-independent metric.
✅ Log Loss (Cross-Entropy Loss)#
Measures the probabilistic confidence of predictions.
Lower log loss = better probability calibration.
More informative than accuracy because it penalizes “overconfident wrong predictions.”
AdaBoost Regression#
When AdaBoost is used with regression trees, you measure how well it predicts continuous values:
Mean Squared Error (MSE)#
Penalizes large errors more heavily.
Root Mean Squared Error (RMSE)#
Same units as the target variable → more interpretable.
Mean Absolute Error (MAE)#
Less sensitive to outliers than MSE.
R² Score (Coefficient of Determination)#
Measures proportion of variance explained by the model.
\(R^2 = 1\): perfect predictions,
\(R^2 = 0\): no better than mean prediction.
Median Absolute Error#
Median of absolute residuals.
Very robust against outliers compared to MSE/RMSE.
Key Insights
For classification, use Accuracy + Precision/Recall/F1 + AUC depending on dataset balance.
For regression, rely on MSE, MAE, RMSE, and R² for error magnitude and explanatory power.
Since AdaBoost can overfit noisy datasets, monitoring multiple metrics is crucial.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification, make_regression
from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.metrics import (
accuracy_score, precision_score, recall_score, f1_score,
roc_auc_score, log_loss, confusion_matrix, classification_report,
mean_squared_error, mean_absolute_error, r2_score, median_absolute_error
)
from sklearn.model_selection import train_test_split
# -------------------------
# PART 1: CLASSIFICATION
# -------------------------
# Create synthetic classification dataset
X_cls, y_cls = make_classification(
n_samples=500, n_features=10, n_informative=5, n_redundant=2,
n_classes=2, weights=[0.6, 0.4], random_state=42
)
# Split data
Xc_train, Xc_test, yc_train, yc_test = train_test_split(X_cls, y_cls, test_size=0.3, random_state=42)
# Train AdaBoost Classifier
ada_cls = AdaBoostClassifier(
estimator=DecisionTreeClassifier(max_depth=1),
n_estimators=100, learning_rate=0.5, random_state=42
)
ada_cls.fit(Xc_train, yc_train)
# Predictions
y_pred_cls = ada_cls.predict(Xc_test)
y_proba_cls = ada_cls.predict_proba(Xc_test)[:,1]
# Classification metrics
cls_metrics = {
"Accuracy": accuracy_score(yc_test, y_pred_cls),
"Precision": precision_score(yc_test, y_pred_cls),
"Recall": recall_score(yc_test, y_pred_cls),
"F1-Score": f1_score(yc_test, y_pred_cls),
"ROC-AUC": roc_auc_score(yc_test, y_proba_cls),
"Log Loss": log_loss(yc_test, y_proba_cls)
}
# -------------------------
# PART 2: REGRESSION
# -------------------------
# Create synthetic regression dataset
X_reg, y_reg = make_regression(
n_samples=500, n_features=10, noise=10.0, random_state=42
)
# Split data
Xr_train, Xr_test, yr_train, yr_test = train_test_split(X_reg, y_reg, test_size=0.3, random_state=42)
# Train AdaBoost Regressor
ada_reg = AdaBoostRegressor(
estimator=DecisionTreeRegressor(max_depth=3),
n_estimators=100, learning_rate=0.5, random_state=42
)
ada_reg.fit(Xr_train, yr_train)
# Predictions
y_pred_reg = ada_reg.predict(Xr_test)
# Regression metrics
reg_metrics = {
"MSE": mean_squared_error(yr_test, y_pred_reg),
"RMSE": np.sqrt(mean_squared_error(yr_test, y_pred_reg)),
"MAE": mean_absolute_error(yr_test, y_pred_reg),
"R2 Score": r2_score(yr_test, y_pred_reg),
"Median Absolute Error": median_absolute_error(yr_test, y_pred_reg)
}
import pandas as pd
cls_df = pd.DataFrame([cls_metrics], index=["AdaBoost Classification"])
reg_df = pd.DataFrame([reg_metrics], index=["AdaBoost Regression"])
(cls_df, reg_df)
( Accuracy Precision Recall F1-Score ROC-AUC \
AdaBoost Classification 0.86 0.907407 0.753846 0.823529 0.956018
Log Loss
AdaBoost Classification 0.499989 ,
MSE RMSE MAE R2 Score \
AdaBoost Regression 5842.940439 76.439129 57.97347 0.713638
Median Absolute Error
AdaBoost Regression 46.093553 )