Workflow of Regularized Regression

Workflow of Regularized Regression #

We start with the standard linear regression equation:

\[ \hat{y} = X\beta + \epsilon \]

where:

\[ J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 \]

Ridge (L2 Regularization):

\[ J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p \beta_j^2 \]
- Shrinks coefficients but never makes them exactly zero.
- Helps with multicollinearity.
Lasso (L1 Regularization):

\[ J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |\beta_j| \]
- Can shrink some coefficients exactly to zero → feature selection.
Elastic Net (Combination of L1 & L2):

\[ J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \Big(\alpha \sum_{j=1}^p |\beta_j| + (1-\alpha) \sum_{j=1}^p \beta_j^2 \Big) \]
- Balances Ridge and Lasso.
- Good for high-dimensional datasets (p >> n).

Use Gradient Descent (or specialized solvers like Coordinate Descent for Lasso).
Iteratively update coefficients:

\[ \beta_j \leftarrow \beta_j - \eta \cdot \frac{\partial J}{\partial \beta_j} \]

where \(\eta\) is the learning rate.

Fit model on training data.
Coefficients shrink depending on the regularization.
- Ridge → small but nonzero.
- Lasso → some zero.
- Elastic Net → mix.

\[ \hat{y}_{test} = X_{test}\beta \]

Summary Workflow