Gradient Descent#
Gradient Descent is an optimization algorithm used to minimize a cost function by updating model parameters (θ). The types mainly differ in how much data they use at each update step.
Types of Gradient Descent#
Batch Gradient Descent#
Definition: Uses the entire training dataset to compute the gradient of the cost function in each iteration.
Formula:
\[ \theta := \theta - \alpha \cdot \frac{1}{m} \sum_{i=1}^{m} \nabla_\theta J(\theta; x^{(i)}, y^{(i)}) \]where \(m\) = number of training examples.
Pros:
Converges to the global minimum for convex functions (like linear regression).
Stable updates.
Cons:
Very slow for large datasets.
Requires huge memory since it must load all data at once.
✅ Best suited for small to medium datasets.
Stochastic Gradient Descent (SGD)#
Definition: Updates parameters for each training example one at a time.
Formula:
\[ \theta := \theta - \alpha \cdot \nabla_\theta J(\theta; x^{(i)}, y^{(i)}) \]Pros:
Much faster (frequent updates).
Can escape local minima due to noisy updates.
Cons:
Updates are noisy → cost function fluctuates rather than smoothly converging.
Harder to reach exact global minimum (oscillates around it).
✅ Best for very large datasets or online learning.
Mini-Batch Gradient Descent#
Definition: A compromise between Batch and SGD. Uses small random subsets (mini-batches) of the data to update parameters.
Formula:
\[ \theta := \theta - \alpha \cdot \frac{1}{b} \sum_{i=1}^{b} \nabla_\theta J(\theta; x^{(i)}, y^{(i)}) \]where \(b\) = mini-batch size (e.g., 32, 64, 128).
Pros:
Faster and more efficient than pure Batch.
Less noisy than SGD.
Can leverage vectorization (parallel processing on GPUs).
Cons:
Choosing batch size is tricky (too small → noisy, too large → slow).
✅ Best for deep learning and neural networks.
Comparison Summary#
Type |
Update Frequency |
Speed |
Stability |
Use Case |
|---|---|---|---|---|
Batch GD |
After full dataset |
Slow |
Very stable |
Small datasets |
Stochastic GD |
After each data point |
Fast per step |
Noisy |
Large datasets, online learning |
Mini-Batch GD |
After subset (batch) |
Fast + efficient |
Balanced |
Deep Learning |