Workflow of Regularized Regression#
Define the Linear Regression Model#
We start with the standard linear regression equation:
where:
\(X\) = input features
\(\beta\) = coefficients
\(\epsilon\) = error term
Define the Cost Function#
Standard regression uses Mean Squared Error (MSE):
Regularization adds a penalty term to control complexity:
Ridge (L2 Regularization):
\[ J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p \beta_j^2 \]Shrinks coefficients but never makes them exactly zero.
Helps with multicollinearity.
Lasso (L1 Regularization):
\[ J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |\beta_j| \]Can shrink some coefficients exactly to zero → feature selection.
Elastic Net (Combination of L1 & L2):
\[ J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \Big(\alpha \sum_{j=1}^p |\beta_j| + (1-\alpha) \sum_{j=1}^p \beta_j^2 \Big) \]Balances Ridge and Lasso.
Good for high-dimensional datasets (p >> n).
Choose Hyperparameters#
\(\lambda\) (regularization strength): Controls penalty size.
\(\alpha\) (for Elastic Net only): Balances L1 vs L2 penalty.
Optimization#
Use Gradient Descent (or specialized solvers like Coordinate Descent for Lasso).
Iteratively update coefficients:
where \(\eta\) is the learning rate.
Model Training#
Fit model on training data.
Coefficients shrink depending on the regularization.
Ridge → small but nonzero.
Lasso → some zero.
Elastic Net → mix.
Model Validation (Cross-Validation)#
Use k-Fold CV to tune \(\lambda\) (and \(\alpha\) for Elastic Net).
Select the value that minimizes validation error.
Prediction#
Use the final model to make predictions:
Summary Workflow
Define regression model.
Add regularization term (L1, L2, or both).
Choose hyperparameters (\(\lambda\), \(\alpha\)).
Optimize using gradient descent/coordinate descent.
Train model → shrink/zero coefficients.
Validate via CV and tune parameters.
Predict on new data.