Hyperparameter Optimization#
Hyperparameter Optimization (HPO) is the process of finding the best hyperparameter values for a machine learning model to maximize performance.
Hyperparameters are settings chosen before training, such as:
Learning rate
Number of layers
Batch size
Regularization strength
Number of trees (XGBoost, Random Forest)
Goal of HPO: $\( \lambda^* = \arg\min_\lambda ; \mathcal{L}_{\text{val}}(\lambda) \)$
Where \(\lambda\) = set of hyperparameters.
HPO ensures the model achieves:
highest accuracy
lowest error
best generalization
Types of Hyperparameter Optimization#
Below are the main methods, from simplest to most advanced.
Manual / Heuristic Search#
Tuning based on experience, intuition, or trial-and-error.
Pros
Simple
Works for small models
Cons
Not systematic
Not scalable
Grid Search#
Tests all possible combinations of hyperparameter values.
Example
Learning rate = {0.01, 0.1} Batch size = {32, 64}
Total = 4 combinations.
Pros
Exhaustive
Easy to implement
Cons
Very slow
Not efficient in high dimensions
Many combinations are wasted
Random Search#
Randomly samples combinations of hyperparameters.
Pros
Much more efficient than grid search
Performs well in high-dimensional spaces
Proven effective for neural networks
Cons
Still blind sampling
No learning from past evaluations
Bayesian Optimization#
A smart, probabilistic method for HPO.
It uses:
Gaussian Process (GP) or other surrogate model
Acquisition function to choose best next hyperparameter
Pros
Sample efficient
Good for expensive training (deep models)
Learns from previous trials
Cons
Slower in high dimensions (> 50D)
Requires probabilistic modeling
Evolutionary / Genetic Algorithms#
Use ideas from evolution:
mutation
crossover
selection
Useful for:
Neural architecture search
Very large search spaces
Non-differentiable objectives
Pros
No gradients needed
Good for complex search spaces
Cons
Computationally expensive
Hyperband and Successive Halving#
Resource-efficient methods.
Hyperband:
Quickly eliminates poor hyperparameters
Allocates more resources to promising ones
Pros
Efficient
Ideal for deep learning
Cons
Requires careful resource budgeting (epochs, steps)
BOHB (Bayesian Optimization + Hyperband)#
Combines:
Bayesian optimization (accuracy)
Hyperband (efficiency)
One of the most effective modern methods.
Population-Based Training (PBT)#
Used in reinforcement learning (DeepMind) and deep learning.
Models train in parallel
Periodically exchange weights and hyperparameters
Hyperparameters evolve during training
Gradient-Based Hyperparameter Optimization#
Uses gradients to optimize hyperparameters directly (rare but emerging).
Applied in:
Meta-learning
Neural architecture search
Summary
Method |
Technique |
Strength |
|---|---|---|
Manual |
Heuristic |
Simple |
Grid Search |
Exhaustive search |
Small spaces |
Random Search |
Random sampling |
Efficient |
Bayesian Opt. |
GP + acquisition |
Best for expensive models |
Evolutionary |
Genetic algorithms |
Large search spaces |
Hyperband |
Bandit-based |
Resource-efficient |
BOHB |
Bayesian + Hyperband |
State-of-the-art |
PBT |
Evolving hyperparameters |
RL and deep learning |
Gradient-based |
Meta-learning |
Advanced research |
4. Which Method to Use?
Situation |
Best HPO Method |
|---|---|
Small dataset |
Grid / Random |
Medium models |
Random Search |
Expensive deep learning training |
Bayesian Optimization |
Very large search spaces |
Evolutionary Algorithms |
Limited compute budget |
Hyperband / BOHB |
RL or evolving training |
Population-Based Training |
Summary
Hyperparameter Optimization is the process of systematically searching for the best hyperparameters to maximize model performance, using methods such as grid search, random search, Bayesian optimization, evolutionary algorithms, and Hyperband.