Hyperparameter Optimization

Hyperparameter Optimization #

Hyperparameter Optimization (HPO) is the process of finding the best hyperparameter values for a machine learning model to maximize performance.

Hyperparameters are settings chosen before training, such as:

Learning rate
Number of layers
Batch size
Regularization strength
Number of trees (XGBoost, Random Forest)

Goal of HPO: $$ \lambda^* = \arg\min_\lambda ; \mathcal{L}_{\text{val}}(\lambda) $$

Where $\lambda$ = set of hyperparameters.

HPO ensures the model achieves:

highest accuracy
lowest error
best generalization

Types of Hyperparameter Optimization #

Below are the main methods, from simplest to most advanced.

Manual / Heuristic Search #

Tuning based on experience, intuition, or trial-and-error.

Pros

Simple
Works for small models

Cons

Not systematic
Not scalable

Grid Search #

Tests all possible combinations of hyperparameter values.

Example

Learning rate = {0.01, 0.1} Batch size = {32, 64}

Total = 4 combinations.

Pros

Exhaustive
Easy to implement

Cons

Very slow
Not efficient in high dimensions
Many combinations are wasted

Random Search #

Randomly samples combinations of hyperparameters.

Pros

Much more efficient than grid search
Performs well in high-dimensional spaces
Proven effective for neural networks

Cons

Still blind sampling
No learning from past evaluations

Bayesian Optimization #

A smart, probabilistic method for HPO.

It uses:

Gaussian Process (GP) or other surrogate model
Acquisition function to choose best next hyperparameter

Pros

Sample efficient
Good for expensive training (deep models)
Learns from previous trials

Cons

Slower in high dimensions (> 50D)
Requires probabilistic modeling

Evolutionary / Genetic Algorithms #

Use ideas from evolution:

mutation
crossover
selection

Useful for:

Neural architecture search
Very large search spaces
Non-differentiable objectives

Pros

No gradients needed
Good for complex search spaces

Cons

Computationally expensive

Hyperband and Successive Halving #

Resource-efficient methods.

Hyperband:

Quickly eliminates poor hyperparameters
Allocates more resources to promising ones

Pros

Efficient
Ideal for deep learning

Cons

Requires careful resource budgeting (epochs, steps)

BOHB (Bayesian Optimization + Hyperband)#

Combines:

Bayesian optimization (accuracy)
Hyperband (efficiency)

One of the most effective modern methods.

Population-Based Training (PBT)#

Used in reinforcement learning (DeepMind) and deep learning.

Models train in parallel
Periodically exchange weights and hyperparameters
Hyperparameters evolve during training

Gradient-Based Hyperparameter Optimization #

Uses gradients to optimize hyperparameters directly (rare but emerging).

Applied in:

Meta-learning
Neural architecture search

Summary

Method	Technique	Strength
Manual	Heuristic	Simple
Grid Search	Exhaustive search	Small spaces
Random Search	Random sampling	Efficient
Bayesian Opt.	GP + acquisition	Best for expensive models
Evolutionary	Genetic algorithms	Large search spaces
Hyperband	Bandit-based	Resource-efficient
BOHB	Bayesian + Hyperband	State-of-the-art
PBT	Evolving hyperparameters	RL and deep learning
Gradient-based	Meta-learning	Advanced research

4. Which Method to Use?

Situation	Best HPO Method
Small dataset	Grid / Random
Medium models	Random Search
Expensive deep learning training	Bayesian Optimization
Very large search spaces	Evolutionary Algorithms
Limited compute budget	Hyperband / BOHB
RL or evolving training	Population-Based Training

Summary

Hyperparameter Optimization is the process of systematically searching for the best hyperparameters to maximize model performance, using methods such as grid search, random search, Bayesian optimization, evolutionary algorithms, and Hyperband.

Hyperparameter Optimization

Contents