Elastic Net Regression

Elastic Net Regression #

Elastic Net Regression is a regularized linear regression technique that combines the penalties of:

Lasso (L1) → drives some coefficients to zero (feature selection).
Ridge (L2) → shrinks coefficients (handles multicollinearity).

It is especially useful when you have many correlated features.

Elastic Net Loss Function #

For a regression model:

\[ y = X\beta + \epsilon \]

The Elastic Net objective function is:

\[ L(\beta) = \frac{1}{2n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \left( \alpha \sum_{j=1}^p |\beta_j| + (1-\alpha)\sum_{j=1}^p \beta_j^2 \right) \]

Where:

\(y_i\) → actual value
\(\hat{y}_i\) → predicted value
\(\beta_j\) → regression coefficients
\(\lambda\) → overall regularization strength
\(\alpha\) → mixing parameter between L1 and L2
- If \(\alpha = 1\): becomes Lasso
- If \(\alpha = 0\): becomes Ridge

Why Use Elastic Net?#

Lasso issue: If features are highly correlated, it tends to pick one and ignore others → unstable.
Ridge issue: Keeps all features but doesn’t perform feature selection.
Elastic Net: Combines both strengths → ✅ Keeps model stable with correlated features. ✅ Performs feature selection.

Example Intuition #

Suppose you’re predicting house price using:

square_feet
bedrooms
bathrooms
location_score

Since square_feet and bedrooms are highly correlated:

Lasso may drop bedrooms entirely.
Ridge will keep both but shrink their weights.
Elastic Net → keeps both but controls their weights → better balance.

Pros & Cons #

✅ Handles multicollinearity ✅ Performs feature selection ✅ Works well when \(p > n\) (more features than samples) ❌ Needs tuning of two hyperparameters (\(\lambda, \alpha\))

Elastic Net Regression

Contents

Elastic Net Regression#

Elastic Net Loss Function#

Why Use Elastic Net?#

Example Intuition#

Pros & Cons#

Elastic Net Regression #

Elastic Net Loss Function #

Example Intuition #

Pros & Cons #