Linear Regression#

What is Linear Regression#

Linear Regression is one of the most fundamental and widely used supervised learning algorithms in machine learning.

  • It is used when the target variable (Y) is continuous (e.g., predicting salary, house price, temperature).

  • The goal is to model the relationship between one or more independent variables (X) and the dependent variable (Y).


Equation of Linear Regression#

  • Simple Linear Regression (one feature):

\[ Y = \theta_0 + \theta_1 X + \epsilon \]
  • Multiple Linear Regression (multiple features):

\[ Y = \theta_0 + \theta_1X_1 + \theta_2X_2 + ... + \theta_nX_n + \epsilon \]

Where:

  • \(\theta_0\) → Intercept (bias term)

  • \(\theta_1, \theta_2, ..., \theta_n\) → Coefficients (slopes/weights)

  • \(\epsilon\) → Error term (captures noise not explained by model)


Key Concepts#

  • Hypothesis Function → Predicts Y from X

  • Cost Function (MSE) → Measures error between predictions and actual values

\[ J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2 \]
  • Optimization (Gradient Descent or OLS) → Finds the best \(\theta\) that minimizes cost


Types of Linear Regression#

  • Simple Linear Regression → One input feature

  • Multiple Linear Regression → More than one input feature

  • Polynomial Regression → Features transformed into polynomial terms (non-linear relationship handled with linear model)

  • Regularized Regression → Ridge (L2), Lasso (L1), ElasticNet (combination)


Assumptions#

Linear regression relies on assumptions:

  1. Linearity → Relationship between X and Y is linear

  2. Independence of errors

  3. Homoscedasticity (constant variance of errors)

  4. Normal distribution of errors

  5. No multicollinearity (in multiple regression)


Performance Metrics#

  • MSE (Mean Squared Error)

  • RMSE (Root Mean Squared Error)

  • MAE (Mean Absolute Error)

  • R² (Coefficient of Determination)


Applications#

  • Predicting house prices

  • Estimating sales vs. advertising budget

  • Predicting student marks vs. study hours

  • Forecasting demand or stock prices (basic approach)


In summary: Linear Regression is the foundation of regression algorithms. It tries to fit the “best fit line (or plane)” that minimizes errors between actual values and predicted values, under certain assumptions.

Click here for Sections