K-Nearest Neighbors (KNN)#
A lazy learning, non-parametric algorithm.
Can solve classification ✅ and regression ✅ problems.
Makes predictions based on the “closeness” (distance) of data points.
No explicit training (model just stores data). Prediction happens at test time.
Workflow of KNN#
Choose K (the number of neighbors)
Example: k=3 → look at the 3 closest points to the test instance.
K is a hyperparameter → chosen via cross-validation.
Calculate Distance
To determine which points are closest to the test instance.
Common distance metrics:
Euclidean Distance (straight-line distance):
\[ d(p, q) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} \]Manhattan Distance (block or city-block distance):
\[ d(p, q) = |x_2 - x_1| + |y_2 - y_1| \]
Find the K Nearest Neighbors
Select the K points in the training data that are closest to the test point.
Vote (Classification) or Average (Regression)
Classification: majority vote among the neighbors’ labels.
Regression: take the average (or median) of neighbors’ values.
Example Intuition#
Dataset: features F1, F2, output Y (binary 0/1).
New test point comes in → find the 5 nearest points.
If 3 belong to class 1 and 2 belong to class 0, predicted class = 1.
For regression:
Predict house price → take the average price of 5 nearest houses.
Variants for Speed (because KNN is slow)#
Naive KNN complexity = \(O(n)\) per query (must compute distance to every point).
Optimizations:
k-d Tree → partitions space into a binary tree for faster lookup.
Ball Tree → groups points into hyperspheres.
These reduce the number of distance computations.
Key Insights
K choice matters:
Low K (e.g., 1) → very sensitive to noise (high variance).
High K (e.g., 20) → oversmoothing, may miss local patterns (high bias).
Distance metric matters:
Euclidean for continuous features.
Manhattan if data has grid-like structure.
Can even use cosine similarity for text.
Feature scaling is crucial:
Since distances are sensitive to scale, normalize/standardize features first.
Visual Intuition (Good to Show Students)#
Classification: draw two sets of points (red & blue). Add a new test point, show how its nearest neighbors decide its label.
Regression: plot house size vs price, show how nearest neighbors’ average gives prediction.
Distance metrics: show Euclidean (diagonal straight line) vs Manhattan (L-shaped path).
Effect of K: with K=1 → decision boundary jagged; with K=15 → smooth.
In Short
KNN = predict based on neighbors.
Works for classification and regression.
Needs careful choice of K and distance metric.
Pros: simple, interpretable, no training time.
Cons: slow with large datasets, sensitive to irrelevant features, requires feature scaling.
Optimized by: k-d trees, ball trees, approximate nearest neighbors.