Hyperparameter Tuning in Linear Regression#

In ordinary least squares (OLS) linear regression, there are actually no hyperparameters to tune — the coefficients are directly computed by minimizing the sum of squared errors.

But when we apply regularization techniques (Ridge, Lasso, ElasticNet), hyperparameters come into play.


Hyperparameters in Linear Regression Variants#

Ridge Regression (L2 Regularization)#

  • Hyperparameter: α (sometimes called λ).

  • Controls the penalty on large coefficients.

    • α = 0 → ordinary least squares (no penalty).

    • Large α → coefficients shrink towards zero but never exactly zero.

  • Effect: reduces variance, prevents overfitting.


Lasso Regression (L1 Regularization)#

  • Hyperparameter: α.

  • Penalizes absolute values of coefficients.

  • Large α → many coefficients become exactly zero → feature selection.

  • Effect: simpler, more interpretable model.


ElasticNet (Combination of L1 & L2)#

  • Hyperparameters:

    • α → overall penalty strength.

    • l1_ratio → balance between L1 (Lasso) and L2 (Ridge).

      • l1_ratio = 0 → pure Ridge.

      • l1_ratio = 1 → pure Lasso.

      • 0 < l1_ratio < 1 → mixture.


Why Hyperparameter Tuning is Needed?#

  • If α is too small → model behaves like OLS, may overfit.

  • If α is too large → coefficients shrink too much, model may underfit.

  • Proper tuning finds a balance.


How to Tune Hyperparameters?#

We use Cross-Validation (CV) to find the best values:

  1. Grid Search CV

    • Try different values of α (and l1_ratio for ElasticNet).

    • Example: test α = [0.01, 0.1, 1, 10, 100].

    • Train model on folds, pick the one with best average CV score.

  2. Randomized Search CV

    • Randomly sample hyperparameters from distributions.

    • More efficient for large search spaces.

  3. Bayesian Optimization (advanced)

    • Uses past evaluation results to choose next hyperparameter values intelligently.


Example (Python, Scikit-learn)#

from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.model_selection import GridSearchCV

# Example: Ridge
ridge = Ridge()

param_grid = {'alpha': [0.01, 0.1, 1, 10, 100]}

grid_search = GridSearchCV(ridge, param_grid, cv=5, scoring='r2')
grid_search.fit(X_train, y_train)

print("Best alpha:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)

Key takeaway:

  • OLS → no hyperparameters.

  • Ridge, Lasso, ElasticNet → hyperparameters (α, l1_ratio).

  • Tune them using cross-validation to balance bias and variance.

Click here for Sections