Hyperparameter Tuning methods

Hyperparameter Tuning methods #

There are several ways to search for the best hyperparameters when tuning machine learning models. Here are the main types:

Types of Hyperparameter Search #

1. Manual Search#

Try parameters by hand based on intuition or domain knowledge.
Example: test α = 0.1, 1, 10 for Ridge.
✅ Simple, but ❌ inefficient and may miss optimal values.

2. Grid Search#

Define a grid of hyperparameter values.
Try all combinations exhaustively with cross-validation.

Example:

alpha = [0.01, 0.1, 1, 10, 100]
l1_ratio = [0.1, 0.5, 0.9]

→ Tests 5 × 3 = 15 combinations.

✅ Systematic, guarantees best within grid.
❌ Expensive if grid is large.

3. Random Search#

Instead of testing all values, randomly sample combinations from given distributions.
Example: alpha ∼ Uniform(0.001, 100)
✅ More efficient than grid, can cover large spaces.
❌ May miss exact optimal if unlucky.

4. Bayesian Optimization#

Uses past evaluation results to model performance as a probability distribution.
Chooses new hyperparameters that are most promising.
✅ Finds optimal faster than grid/random.
❌ More complex, needs specialized libraries (optuna, scikit-optimize, hyperopt).

5. Gradient-Based Optimization (advanced)#

Uses gradients of the loss with respect to hyperparameters.
Works mainly for continuous hyperparameters.
Rare in practice because many hyperparameters (like max_depth) are discrete.

6. Evolutionary / Genetic Algorithms#

Treat hyperparameters like genes.
Randomly mutate and crossover values across generations.
✅ Can escape local optima.
❌ Slower, harder to tune.

7. Successive Halving / Hyperband#

Start with many random hyperparameter sets.
Train each briefly.
Discard poorly performing ones early, keep only the best for longer training.
✅ Efficient, reduces wasted computation.

Summary Table

Method	Strategy	Pros	Cons
Manual Search	Trial-and-error	Simple	Not systematic
Grid Search	Exhaustive combinations	Guaranteed best in grid	Expensive
Random Search	Random sampling	Efficient, scalable	No guarantee
Bayesian Optimization	Probabilistic model-guided search	Fast convergence	Complex
Gradient-Based	Gradient descent on hyperparams	Precise for continuous vars	Rarely practical
Evolutionary Algorithms	Mutation + crossover	Escapes local optima	Slow
Hyperband / Successive Halving	Early stopping bad configs	Saves compute	Needs careful setup

👉 In practice:

For small problems → Grid Search.
For large spaces → Random Search or Hyperband.
For serious optimization → Bayesian Optimization (e.g., Optuna).

import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from sklearn.metrics import r2_score

# Generate synthetic regression dataset
X, y = make_regression(n_samples=200, n_features=10, noise=15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ---------------- Grid Search ----------------
ridge = Ridge()

param_grid = {'alpha': [0.01, 0.1, 1, 10, 100, 1000]}  # exhaustive list
grid_search = GridSearchCV(ridge, param_grid, cv=5, scoring='r2')
grid_search.fit(X_train, y_train)

print("GridSearchCV best params:", grid_search.best_params_)
print("GridSearchCV best CV score:", grid_search.best_score_)

# Evaluate on test data
y_pred_grid = grid_search.best_estimator_.predict(X_test)
print("GridSearchCV test R2:", r2_score(y_test, y_pred_grid))

# ---------------- Random Search ----------------
param_dist = {'alpha': np.logspace(-3, 3, 100)}  # random sampling from wide range
random_search = RandomizedSearchCV(ridge, param_dist, n_iter=10, cv=5, scoring='r2', random_state=42)
random_search.fit(X_train, y_train)

print("\nRandomizedSearchCV best params:", random_search.best_params_)
print("RandomizedSearchCV best CV score:", random_search.best_score_)

# Evaluate on test data
y_pred_rand = random_search.best_estimator_.predict(X_test)
print("RandomizedSearchCV test R2:", r2_score(y_test, y_pred_rand))

GridSearchCV best params: {'alpha': 0.1}
GridSearchCV best CV score: 0.9907142472562647
GridSearchCV test R2: 0.9934316711441261

RandomizedSearchCV best params: {'alpha': 0.021544346900318846}
RandomizedSearchCV best CV score: 0.9907141636217599
RandomizedSearchCV test R2: 0.9934566226827182

The Kernel crashed while executing code in the current cell or a previous cell. 

Please review the code in the cell(s) to identify a possible cause of the failure. 

Click <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. 

View Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details.

Hyperparameter Tuning methods

Contents