Gradient Descent

Gradient Descent #

Gradient Descent is an optimization algorithm used to minimize a cost function by updating model parameters (θ). The types mainly differ in how much data they use at each update step.

Types of Gradient Descent #

Batch Gradient Descent #

Definition: Uses the entire training dataset to compute the gradient of the cost function in each iteration.
Formula:

\[ \theta := \theta - \alpha \cdot \frac{1}{m} \sum_{i=1}^{m} \nabla_\theta J(\theta; x^{(i)}, y^{(i)}) \]

where \(m\) = number of training examples.
Pros:
- Converges to the global minimum for convex functions (like linear regression).
- Stable updates.
Cons:
- Very slow for large datasets.
- Requires huge memory since it must load all data at once.
✅ Best suited for small to medium datasets.

Stochastic Gradient Descent (SGD)#

Definition: Updates parameters for each training example one at a time.
Formula:

\[ \theta := \theta - \alpha \cdot \nabla_\theta J(\theta; x^{(i)}, y^{(i)}) \]
Pros:
- Much faster (frequent updates).
- Can escape local minima due to noisy updates.
Cons:
- Updates are noisy → cost function fluctuates rather than smoothly converging.
- Harder to reach exact global minimum (oscillates around it).
✅ Best for very large datasets or online learning.

Mini-Batch Gradient Descent #

Definition: A compromise between Batch and SGD. Uses small random subsets (mini-batches) of the data to update parameters.
Formula:

\[ \theta := \theta - \alpha \cdot \frac{1}{b} \sum_{i=1}^{b} \nabla_\theta J(\theta; x^{(i)}, y^{(i)}) \]

where \(b\) = mini-batch size (e.g., 32, 64, 128).
Pros:
- Faster and more efficient than pure Batch.
- Less noisy than SGD.
- Can leverage vectorization (parallel processing on GPUs).
Cons:
- Choosing batch size is tricky (too small → noisy, too large → slow).
✅ Best for deep learning and neural networks.

Comparison Summary #

Type	Update Frequency	Speed	Stability	Use Case
Batch GD	After full dataset	Slow	Very stable	Small datasets
Stochastic GD	After each data point	Fast per step	Noisy	Large datasets, online learning
Mini-Batch GD	After subset (batch)	Fast + efficient	Balanced	Deep Learning

Gradient Descent

Contents

Gradient Descent#

Types of Gradient Descent#

Batch Gradient Descent#

Stochastic Gradient Descent (SGD)#

Mini-Batch Gradient Descent#

Comparison Summary#

Gradient Descent #

Types of Gradient Descent #

Batch Gradient Descent #

Mini-Batch Gradient Descent #

Comparison Summary #