K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN)#

Choose K (the number of neighbors)
- Example: k=3 → look at the 3 closest points to the test instance.
- K is a hyperparameter → chosen via cross-validation.
Calculate Distance
- To determine which points are closest to the test instance.
- Common distance metrics:
  - Euclidean Distance (straight-line distance):
    
    \[ d(p, q) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} \]
  - Manhattan Distance (block or city-block distance):
    
    \[ d(p, q) = |x_2 - x_1| + |y_2 - y_1| \]
Find the K Nearest Neighbors
- Select the K points in the training data that are closest to the test point.
Vote (Classification) or Average (Regression)
- Classification: majority vote among the neighbors’ labels.
- Regression: take the average (or median) of neighbors’ values.

For regression:

Naive KNN complexity = \(O(n)\) per query (must compute distance to every point).
Optimizations:
- k-d Tree → partitions space into a binary tree for faster lookup.
- Ball Tree → groups points into hyperspheres.
These reduce the number of distance computations.

Key Insights

K choice matters:
- Low K (e.g., 1) → very sensitive to noise (high variance).
- High K (e.g., 20) → oversmoothing, may miss local patterns (high bias).
Distance metric matters:
- Euclidean for continuous features.
- Manhattan if data has grid-like structure.
- Can even use cosine similarity for text.
Feature scaling is crucial:
- Since distances are sensitive to scale, normalize/standardize features first.

Classification: draw two sets of points (red & blue). Add a new test point, show how its nearest neighbors decide its label.
Regression: plot house size vs price, show how nearest neighbors’ average gives prediction.
Distance metrics: show Euclidean (diagonal straight line) vs Manhattan (L-shaped path).
Effect of K: with K=1 → decision boundary jagged; with K=15 → smooth.

In Short

KNN = predict based on neighbors.
Works for classification and regression.
Needs careful choice of K and distance metric.
Pros: simple, interpretable, no training time.
Cons: slow with large datasets, sensitive to irrelevant features, requires feature scaling.
Optimized by: k-d trees, ball trees, approximate nearest neighbors.