Linear Regression

Linear Regression #

What is Linear Regression #

Linear Regression is one of the most fundamental and widely used supervised learning algorithms in machine learning.

It is used when the target variable (Y) is continuous (e.g., predicting salary, house price, temperature).
The goal is to model the relationship between one or more independent variables (X) and the dependent variable (Y).

Equation of Linear Regression #

Simple Linear Regression (one feature):

\[ Y = \theta_0 + \theta_1 X + \epsilon \]

Multiple Linear Regression (multiple features):

\[ Y = \theta_0 + \theta_1X_1 + \theta_2X_2 + ... + \theta_nX_n + \epsilon \]

Where:

\(\theta_0\) → Intercept (bias term)
\(\theta_1, \theta_2, ..., \theta_n\) → Coefficients (slopes/weights)
\(\epsilon\) → Error term (captures noise not explained by model)

Key Concepts#

Hypothesis Function → Predicts Y from X
Cost Function (MSE) → Measures error between predictions and actual values

\[ J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2 \]

Optimization (Gradient Descent or OLS) → Finds the best \(\theta\) that minimizes cost

Types of Linear Regression #

Simple Linear Regression → One input feature
Multiple Linear Regression → More than one input feature
Polynomial Regression → Features transformed into polynomial terms (non-linear relationship handled with linear model)
Regularized Regression → Ridge (L2), Lasso (L1), ElasticNet (combination)

Assumptions #

Linear regression relies on assumptions:

Linearity → Relationship between X and Y is linear
Independence of errors
Homoscedasticity (constant variance of errors)
Normal distribution of errors
No multicollinearity (in multiple regression)

Performance Metrics #

MSE (Mean Squared Error)
RMSE (Root Mean Squared Error)
MAE (Mean Absolute Error)
R² (Coefficient of Determination)

Applications #

Predicting house prices
Estimating sales vs. advertising budget
Predicting student marks vs. study hours
Forecasting demand or stock prices (basic approach)

✅ In summary: Linear Regression is the foundation of regression algorithms. It tries to fit the “best fit line (or plane)” that minimizes errors between actual values and predicted values, under certain assumptions.

Linear Regression

Contents