Hyperparameter Tuning in Linear Regression

Hyperparameter Tuning in Linear Regression #

In ordinary least squares (OLS) linear regression, there are actually no hyperparameters to tune — the coefficients are directly computed by minimizing the sum of squared errors.

But when we apply regularization techniques (Ridge, Lasso, ElasticNet), hyperparameters come into play.

Hyperparameters in Linear Regression Variants #

Ridge Regression (L2 Regularization)#

Hyperparameter: α (sometimes called λ).
Controls the penalty on large coefficients.
- α = 0 → ordinary least squares (no penalty).
- Large α → coefficients shrink towards zero but never exactly zero.
Effect: reduces variance, prevents overfitting.

Lasso Regression (L1 Regularization)#

Hyperparameter: α.
Penalizes absolute values of coefficients.
Large α → many coefficients become exactly zero → feature selection.
Effect: simpler, more interpretable model.

ElasticNet (Combination of L1 & L2)#

Hyperparameters:
- α → overall penalty strength.
- l1_ratio → balance between L1 (Lasso) and L2 (Ridge).
  - l1_ratio = 0 → pure Ridge.
  - l1_ratio = 1 → pure Lasso.
  - 0 < l1_ratio < 1 → mixture.

Why Hyperparameter Tuning is Needed?#

If α is too small → model behaves like OLS, may overfit.
If α is too large → coefficients shrink too much, model may underfit.
Proper tuning finds a balance.

How to Tune Hyperparameters?#

We use Cross-Validation (CV) to find the best values:

Grid Search CV
- Try different values of α (and l1_ratio for ElasticNet).
- Example: test α = [0.01, 0.1, 1, 10, 100].
- Train model on folds, pick the one with best average CV score.
Randomized Search CV
- Randomly sample hyperparameters from distributions.
- More efficient for large search spaces.
Bayesian Optimization (advanced)
- Uses past evaluation results to choose next hyperparameter values intelligently.

Example (Python, Scikit-learn)#

from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.model_selection import GridSearchCV

# Example: Ridge
ridge = Ridge()

param_grid = {'alpha': [0.01, 0.1, 1, 10, 100]}

grid_search = GridSearchCV(ridge, param_grid, cv=5, scoring='r2')
grid_search.fit(X_train, y_train)

print("Best alpha:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)

Key takeaway:

OLS → no hyperparameters.
Ridge, Lasso, ElasticNet → hyperparameters (α, l1_ratio).
Tune them using cross-validation to balance bias and variance.

Hyperparameter Tuning in Linear Regression

Contents