Hyperparameter Optimization#

Hyperparameter Optimization (HPO) is the process of finding the best hyperparameter values for a machine learning model to maximize performance.

Hyperparameters are settings chosen before training, such as:

  • Learning rate

  • Number of layers

  • Batch size

  • Regularization strength

  • Number of trees (XGBoost, Random Forest)

Goal of HPO: $\( \lambda^* = \arg\min_\lambda ; \mathcal{L}_{\text{val}}(\lambda) \)$

Where \(\lambda\) = set of hyperparameters.

HPO ensures the model achieves:

  • highest accuracy

  • lowest error

  • best generalization


Types of Hyperparameter Optimization#

Below are the main methods, from simplest to most advanced.





Bayesian Optimization#

A smart, probabilistic method for HPO.

It uses:

  • Gaussian Process (GP) or other surrogate model

  • Acquisition function to choose best next hyperparameter

Pros

  • Sample efficient

  • Good for expensive training (deep models)

  • Learns from previous trials

Cons

  • Slower in high dimensions (> 50D)

  • Requires probabilistic modeling


Evolutionary / Genetic Algorithms#

Use ideas from evolution:

  • mutation

  • crossover

  • selection

Useful for:

  • Neural architecture search

  • Very large search spaces

  • Non-differentiable objectives

Pros

  • No gradients needed

  • Good for complex search spaces

Cons

  • Computationally expensive


Hyperband and Successive Halving#

Resource-efficient methods.

Hyperband:

  • Quickly eliminates poor hyperparameters

  • Allocates more resources to promising ones

Pros

  • Efficient

  • Ideal for deep learning

Cons

  • Requires careful resource budgeting (epochs, steps)


BOHB (Bayesian Optimization + Hyperband)#

Combines:

  • Bayesian optimization (accuracy)

  • Hyperband (efficiency)

One of the most effective modern methods.


Population-Based Training (PBT)#

Used in reinforcement learning (DeepMind) and deep learning.

  • Models train in parallel

  • Periodically exchange weights and hyperparameters

  • Hyperparameters evolve during training


Gradient-Based Hyperparameter Optimization#

Uses gradients to optimize hyperparameters directly (rare but emerging).

Applied in:

  • Meta-learning

  • Neural architecture search


Summary

Method

Technique

Strength

Manual

Heuristic

Simple

Grid Search

Exhaustive search

Small spaces

Random Search

Random sampling

Efficient

Bayesian Opt.

GP + acquisition

Best for expensive models

Evolutionary

Genetic algorithms

Large search spaces

Hyperband

Bandit-based

Resource-efficient

BOHB

Bayesian + Hyperband

State-of-the-art

PBT

Evolving hyperparameters

RL and deep learning

Gradient-based

Meta-learning

Advanced research


4. Which Method to Use?

Situation

Best HPO Method

Small dataset

Grid / Random

Medium models

Random Search

Expensive deep learning training

Bayesian Optimization

Very large search spaces

Evolutionary Algorithms

Limited compute budget

Hyperband / BOHB

RL or evolving training

Population-Based Training


Summary

Hyperparameter Optimization is the process of systematically searching for the best hyperparameters to maximize model performance, using methods such as grid search, random search, Bayesian optimization, evolutionary algorithms, and Hyperband.