Hyperparameter Tuning

Hyperparameter Tuning #

Gradient Boosting Regressor #

Hyperparameters control model complexity, learning speed, and generalization. Key ones:

Hyperparameter	Role / Effect
`n_estimators`	Number of weak learners (trees). Too high → overfitting; too low → underfitting.
`learning_rate`	Shrinkage factor applied to each tree. Smaller → slower learning, reduces overfitting.
`max_depth`	Maximum depth of each tree. Higher depth → more complex trees → risk of overfitting.
`min_samples_split`	Minimum samples required to split a node. Higher → simpler trees → prevents overfitting.
`min_samples_leaf`	Minimum samples required at a leaf. Higher → prevents overfitting.
`subsample`	Fraction of data used for each tree (stochastic gradient boosting). Reduces variance.
`max_features`	Max features considered at each split. Reduces correlation between trees, reduces overfitting.
`loss`	Loss function (MSE, MAE, Huber). Controls sensitivity to outliers.

Hyperparameter Tuning Strategies#

Grid Search

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor

param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.05, 0.1],
    'max_depth': [2, 3, 4],
    'subsample': [0.8, 1.0]
}

gbr = GradientBoostingRegressor()
grid_search = GridSearchCV(gbr, param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_

Randomized Search – faster for large grids, samples combinations randomly.
Early Stopping – monitor validation error to stop adding trees automatically:

gbr = GradientBoostingRegressor(n_estimators=1000, validation_fraction=0.1,
                                n_iter_no_change=10, tol=1e-4)
gbr.fit(X_train, y_train)

3. Handling Overfitting#

Signs: training error is much lower than validation error.

Strategies:

Reduce n_estimators or max_depth.
Increase min_samples_split / min_samples_leaf.
Reduce learning_rate and increase n_estimators (slower, smoother learning).
Use subsample < 1.0 (stochastic gradient boosting).
Limit max_features to reduce tree correlation.
Use early stopping with validation set.

4. Handling Underfitting#

Signs: both training and validation errors are high.

Strategies:

Increase n_estimators (more trees).
Increase max_depth (more complex trees).
Reduce min_samples_split / min_samples_leaf (allows finer splits).
Increase learning_rate to allow faster learning.
Ensure sufficient features in max_features.

5. Workflow for Tuning GBR#

Start with shallow trees (max_depth=2-3) and small learning_rate=0.05-0.1.
Increase n_estimators gradually, monitor validation error.
If overfitting occurs → reduce depth, increase min_samples_leaf, decrease learning_rate, or use subsample.
If underfitting → increase depth, increase learning_rate, increase n_estimators.
Use cross-validation for robust parameter selection.

Key Intuition

Learning rate vs n_estimators: smaller learning rate + more trees → smoother learning, lower risk of overfitting.
Tree complexity: deeper trees → fit training data closely → risk of overfitting.
Stochastic subsampling: reduces variance, improves generalization.

Demonstration #

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor

# -------------------------------
# Create a synthetic dataset
# -------------------------------
np.random.seed(0)
X = np.linspace(0, 10, 50).reshape(-1,1)
y = np.sin(X).ravel() + np.random.normal(0, 0.2, X.shape[0])  # noisy sine wave
plt.figure(figsize=(15,8))
plt.subplot(2,2,1)
plt.scatter(X, y, color='black', label='Data')
plt.title("Original Data")
plt.legend()

# -------------------------------
# Underfitting example
# -------------------------------
gbr_under = GradientBoostingRegressor(n_estimators=10, max_depth=1, learning_rate=0.05)
gbr_under.fit(X, y)
y_pred_under = gbr_under.predict(X)

plt.subplot(2,2,2)
plt.scatter(X, y, color='black', label='Data')
plt.plot(X, y_pred_under, color='red', label='Underfit Model')
plt.title("Underfitting Example")
plt.legend()


# -------------------------------
# Overfitting example
# -------------------------------
gbr_over = GradientBoostingRegressor(n_estimators=500, max_depth=5, learning_rate=0.2)
gbr_over.fit(X, y)
y_pred_over = gbr_over.predict(X)
plt.subplot(2,2,3)
plt.scatter(X, y, color='black', label='Data')
plt.plot(X, y_pred_over, color='green', label='Overfit Model')
plt.title("Overfitting Example")
plt.legend()


# -------------------------------
# Properly tuned example
# -------------------------------
gbr_tuned = GradientBoostingRegressor(n_estimators=100, max_depth=3, learning_rate=0.1)
gbr_tuned.fit(X, y)
y_pred_tuned = gbr_tuned.predict(X)
plt.subplot(2,2,4)
plt.scatter(X, y, color='black', label='Data')
plt.plot(X, y_pred_tuned, color='blue', label='Properly Tuned Model')
plt.title("Properly Tuned Gradient Boosting")
plt.legend()
plt.show()

../../../_images/bdebd8d4815f99e4a970e00757c2c4c49537f057c4dbce91553189340f9409b1.png

Interpretation #

Underfitting (n_estimators=10, max_depth=1, learning_rate=0.05)
- Model is too simple → cannot capture sine wave pattern.
- Both training and test errors are high.
Overfitting (n_estimators=500, max_depth=5, learning_rate=0.2)
- Model is too complex → fits noise in training data.
- Low training error, high validation error.
Properly tuned (n_estimators=100, max_depth=3, learning_rate=0.1)
- Balanced complexity → captures main pattern without fitting noise.
- Best generalization.

Gradient Boosting Classifier #

Here’s a detailed explanation for Gradient Boosting Classifier (GBC) regarding hyperparameter tuning, overfitting, and underfitting:

Hyperparameter	Role / Effect
`n_estimators`	Number of weak learners (trees). Too high → overfitting; too low → underfitting.
`learning_rate`	Shrinkage factor applied to each tree. Smaller → slower learning, reduces overfitting.
`max_depth`	Maximum depth of each tree. Higher depth → more complex trees → risk of overfitting.
`min_samples_split`	Minimum samples required to split a node. Higher → simpler trees → prevents overfitting.
`min_samples_leaf`	Minimum samples required at a leaf. Higher → prevents overfitting.
`subsample`	Fraction of data used for each tree (stochastic gradient boosting). Reduces variance.
`max_features`	Max features considered at each split. Reduces correlation between trees, reduces overfitting.
`loss`	Loss function (`deviance` = logistic loss for classification). Controls how tree fits probabilities.

2. Hyperparameter Tuning Strategies#

Grid Search

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier

param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.05, 0.1],
    'max_depth': [2, 3, 4],
    'subsample': [0.8, 1.0]
}

gbc = GradientBoostingClassifier()
grid_search = GridSearchCV(gbc, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_

Randomized Search – faster for large grids, samples combinations randomly.
Early Stopping – stop adding trees when validation accuracy stops improving:

gbc = GradientBoostingClassifier(n_estimators=1000, validation_fraction=0.1,
                                 n_iter_no_change=10, tol=1e-4)
gbc.fit(X_train, y_train)

3. Handling Overfitting#

Signs: training accuracy is high, validation accuracy is low.

Strategies:

Reduce max_depth or n_estimators.
Reduce learning_rate and increase n_estimators.
Increase min_samples_split or min_samples_leaf.
Use subsample < 1.0 to train each tree on a subset.
Limit max_features to reduce correlation between trees.
Use early stopping on a validation set.

4. Handling Underfitting#

Signs: both training and validation accuracy are low.

Strategies:

Increase n_estimators to allow more boosting rounds.
Increase max_depth for more complex trees.
Decrease min_samples_split / min_samples_leaf to allow finer splits.
Increase learning_rate for faster learning.
Include more features by adjusting max_features.

5. Workflow for Tuning GBC#

Start with shallow trees (max_depth=2-3) and small learning_rate=0.05-0.1.
Gradually increase n_estimators, monitoring validation accuracy.
If overfitting → reduce depth, increase min_samples_leaf, decrease learning_rate, use subsample < 1.
If underfitting → increase depth, increase learning_rate, increase n_estimators.
Cross-validation is essential for robust hyperparameter selection.

Key Intuition

Learning rate vs n_estimators: smaller learning rate + more trees → slower but smoother learning, reduces overfitting.
Tree complexity: deeper trees → fit training data closely → risk of overfitting.
Subsampling: reduces variance, improves generalization.

Demonstration #

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_classification
from matplotlib.colors import ListedColormap
import warnings
warnings.filterwarnings("ignore")

# -------------------------------
# Create a synthetic binary dataset
# -------------------------------
X, y = make_classification(n_samples=200, n_features=2, n_informative=2, 
                           n_redundant=0, n_clusters_per_class=1, random_state=0)



plt.figure(figsize=(15,8))
# Function to plot decision boundary
def plot_decision_boundary(model, X, y, title,axis=(1,1,1)):
    cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA'])
    cmap_bold = ListedColormap(['#FF0000', '#00FF00'])
    x_min, x_max = X[:, 0].min() - 1, X[:,0].max() + 1
    y_min, y_max = X[:,1].min() - 1, X[:,1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.05),
                         np.arange(y_min, y_max, 0.05))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    plt.subplot(*axis)
    plt.contourf(xx, yy, Z, cmap=cmap_light, alpha=0.4)
    plt.scatter(X[:,0], X[:,1], c=y, cmap=cmap_bold, edgecolor='k')
    plt.title(title)
    plt.legend()
    

# -------------------------------
# Underfitting example
# -------------------------------
gbc_under = GradientBoostingClassifier(n_estimators=10, max_depth=1, learning_rate=0.05)
gbc_under.fit(X, y)
# plt.subplot(1,3,1)
plot_decision_boundary(gbc_under, X, y, "Underfitting Example",axis=(1,3,1))

# -------------------------------
# Overfitting example
# -------------------------------
gbc_over = GradientBoostingClassifier(n_estimators=500, max_depth=5, learning_rate=0.2)
gbc_over.fit(X, y)
# plt.subplot(1,3,2)
plot_decision_boundary(gbc_over, X, y, "Overfitting Example",axis=(1,3,2))

# -------------------------------
# Properly tuned example
# -------------------------------
gbc_tuned = GradientBoostingClassifier(n_estimators=100, max_depth=3, learning_rate=0.1)
gbc_tuned.fit(X, y)
# plt.subplot(1,3,3)
plot_decision_boundary(gbc_tuned, X, y, "Properly Tuned Gradient Boosting Classifier",axis=(1,3,3))
plt.show()

../../../_images/bb6e21a4f2d5e74ab8d7f1a181423519d0efa3878991fc9bc55d25449daf28c0.png

Hyperparameter Tuning

Contents