Elastic Net Regression#

Elastic Net Regression is a regularized linear regression technique that combines the penalties of:

  • Lasso (L1) → drives some coefficients to zero (feature selection).

  • Ridge (L2) → shrinks coefficients (handles multicollinearity).

It is especially useful when you have many correlated features.


Elastic Net Loss Function#

For a regression model:

\[ y = X\beta + \epsilon \]

The Elastic Net objective function is:

\[ L(\beta) = \frac{1}{2n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \left( \alpha \sum_{j=1}^p |\beta_j| + (1-\alpha)\sum_{j=1}^p \beta_j^2 \right) \]

Where:

  • \(y_i\) → actual value

  • \(\hat{y}_i\) → predicted value

  • \(\beta_j\) → regression coefficients

  • \(\lambda\) → overall regularization strength

  • \(\alpha\) → mixing parameter between L1 and L2

    • If \(\alpha = 1\): becomes Lasso

    • If \(\alpha = 0\): becomes Ridge


Why Use Elastic Net?#

  • Lasso issue: If features are highly correlated, it tends to pick one and ignore others → unstable.

  • Ridge issue: Keeps all features but doesn’t perform feature selection.

  • Elastic Net: Combines both strengths → ✅ Keeps model stable with correlated features. ✅ Performs feature selection.


Example Intuition#

Suppose you’re predicting house price using:

  • square_feet

  • bedrooms

  • bathrooms

  • location_score

Since square_feet and bedrooms are highly correlated:

  • Lasso may drop bedrooms entirely.

  • Ridge will keep both but shrink their weights.

  • Elastic Net → keeps both but controls their weights → better balance.


Pros & Cons#

✅ Handles multicollinearity ✅ Performs feature selection ✅ Works well when \(p > n\) (more features than samples) ❌ Needs tuning of two hyperparameters (\(\lambda, \alpha\))