Hyperparameter Tuning in Linear Regression#
In ordinary least squares (OLS) linear regression, there are actually no hyperparameters to tune — the coefficients are directly computed by minimizing the sum of squared errors.
But when we apply regularization techniques (Ridge, Lasso, ElasticNet), hyperparameters come into play.
Hyperparameters in Linear Regression Variants#
Ridge Regression (L2 Regularization)#
Hyperparameter:
α(sometimes called λ).Controls the penalty on large coefficients.
α = 0→ ordinary least squares (no penalty).Large
α→ coefficients shrink towards zero but never exactly zero.
Effect: reduces variance, prevents overfitting.
Lasso Regression (L1 Regularization)#
Hyperparameter:
α.Penalizes absolute values of coefficients.
Large
α→ many coefficients become exactly zero → feature selection.Effect: simpler, more interpretable model.
ElasticNet (Combination of L1 & L2)#
Hyperparameters:
α→ overall penalty strength.l1_ratio→ balance between L1 (Lasso) and L2 (Ridge).l1_ratio = 0→ pure Ridge.l1_ratio = 1→ pure Lasso.0 < l1_ratio < 1→ mixture.
Why Hyperparameter Tuning is Needed?#
If
αis too small → model behaves like OLS, may overfit.If
αis too large → coefficients shrink too much, model may underfit.Proper tuning finds a balance.
How to Tune Hyperparameters?#
We use Cross-Validation (CV) to find the best values:
Grid Search CV
Try different values of
α(andl1_ratiofor ElasticNet).Example: test
α = [0.01, 0.1, 1, 10, 100].Train model on folds, pick the one with best average CV score.
Randomized Search CV
Randomly sample hyperparameters from distributions.
More efficient for large search spaces.
Bayesian Optimization (advanced)
Uses past evaluation results to choose next hyperparameter values intelligently.
Example (Python, Scikit-learn)#
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.model_selection import GridSearchCV
# Example: Ridge
ridge = Ridge()
param_grid = {'alpha': [0.01, 0.1, 1, 10, 100]}
grid_search = GridSearchCV(ridge, param_grid, cv=5, scoring='r2')
grid_search.fit(X_train, y_train)
print("Best alpha:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)
Key takeaway:
OLS → no hyperparameters.
Ridge, Lasso, ElasticNet → hyperparameters (
α,l1_ratio).Tune them using cross-validation to balance bias and variance.