Regression & Classification

Regression & Classification #

1. KNN for Classification#

Goal: Assign a class label to a new data point based on its neighbors.

Step-by-Step Process:#

Choose k – the number of nearest neighbors to consider.
Compute distances – calculate the distance between the new point and all points in the training set (commonly Euclidean).
Find nearest neighbors – select the k points closest to the new point.
Vote for the class – the most frequent class among the neighbors is assigned to the new point.
Optional: use distance weighting, so closer neighbors count more in voting.

Example Intuition:#

Imagine a 2D plot:

Red and Blue points represent two classes.
A new point appears (green).
If its 3 nearest neighbors are 2 red and 1 blue → predict Red.

Metrics for Evaluation:#

Accuracy, Precision, Recall, F1-score, Confusion Matrix.

2. KNN for Regression#

Goal: Predict a continuous value for a new data point.

Step-by-Step Process:#

Choose k – number of nearest neighbors.
Compute distances – measure closeness between the new point and all training points.
Find nearest neighbors – select the k closest points.
Average the values – the predicted value is the mean (or weighted mean) of the neighbors’ target values.
Optional: weight neighbors inversely by distance.

Example Intuition:#

Suppose you want to predict house prices based on size.
A new house appears.
Take the k nearest houses by size and average their prices → predicted price.

Metrics for Evaluation:#

Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), R² score.

3. Key Differences Between Classification and Regression#

Aspect	Classification	Regression
Output	Discrete class label	Continuous numeric value
Prediction Method	Majority vote of neighbors	Mean (or weighted mean) of neighbors
Evaluation Metrics	Accuracy, Precision, Recall, F1	MSE, RMSE, MAE, R²
Example	Predict if email is spam or not	Predict house price

4. Visualization Example (Intuition)#

Classification: points in 2D space colored by class. A new point takes the class of majority of nearby points.
Regression: points in 2D space with numeric values. The new point’s value is averaged from nearby points.

KNN is powerful because it’s simple, intuitive, and non-parametric, but its performance depends heavily on:

Choice of k
Distance metric
Feature scaling

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification, make_regression
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor

# Classification dataset
X_clf, y_clf = make_classification(n_samples=100, n_features=2, n_informative=2,
                                   n_redundant=0, n_clusters_per_class=1, random_state=42)
# Regression dataset
X_reg, y_reg = make_regression(n_samples=100, n_features=1, noise=15, random_state=42)

# KNN Classifier
knn_clf = KNeighborsClassifier(n_neighbors=5)
knn_clf.fit(X_clf, y_clf)

# KNN Regressor
knn_reg = KNeighborsRegressor(n_neighbors=5)
knn_reg.fit(X_reg, y_reg)

# Plotting Classification
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
plt.scatter(X_clf[:,0], X_clf[:,1], c=y_clf, cmap='bwr', edgecolor='k', s=60)
plt.title('KNN Classification Dataset')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

# Plotting Regression
plt.subplot(1,2,2)
plt.scatter(X_reg[:,0], y_reg, color='green', edgecolor='k', s=60)
# Generate predictions for a smooth line
X_line = np.linspace(X_reg.min(), X_reg.max(), 200).reshape(-1,1)
y_line = knn_reg.predict(X_line)
plt.plot(X_line, y_line, color='red', linewidth=2)
plt.title('KNN Regression Dataset')
plt.xlabel('Feature')
plt.ylabel('Target')

plt.tight_layout()
plt.show()

../../../_images/dbb82f8f3f7c5c623d0eb90e6eb56f54d8aeb928c80e15cb7d73480fdcc0bc20.png

Regression & Classification

Contents