K-Nearest Neighbors (KNN)#

  • A lazy learning, non-parametric algorithm.

  • Can solve classification ✅ and regression ✅ problems.

  • Makes predictions based on the “closeness” (distance) of data points.

  • No explicit training (model just stores data). Prediction happens at test time.


Workflow of KNN#

  1. Choose K (the number of neighbors)

    • Example: k=3 → look at the 3 closest points to the test instance.

    • K is a hyperparameter → chosen via cross-validation.

  2. Calculate Distance

    • To determine which points are closest to the test instance.

    • Common distance metrics:

      • Euclidean Distance (straight-line distance):

        \[ d(p, q) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} \]
      • Manhattan Distance (block or city-block distance):

        \[ d(p, q) = |x_2 - x_1| + |y_2 - y_1| \]
  3. Find the K Nearest Neighbors

    • Select the K points in the training data that are closest to the test point.

  4. Vote (Classification) or Average (Regression)

    • Classification: majority vote among the neighbors’ labels.

    • Regression: take the average (or median) of neighbors’ values.


Example Intuition#

  • Dataset: features F1, F2, output Y (binary 0/1).

  • New test point comes in → find the 5 nearest points.

  • If 3 belong to class 1 and 2 belong to class 0, predicted class = 1.

For regression:

  • Predict house price → take the average price of 5 nearest houses.


Variants for Speed (because KNN is slow)#

  • Naive KNN complexity = \(O(n)\) per query (must compute distance to every point).

  • Optimizations:

    • k-d Tree → partitions space into a binary tree for faster lookup.

    • Ball Tree → groups points into hyperspheres.

  • These reduce the number of distance computations.


Key Insights

  • K choice matters:

    • Low K (e.g., 1) → very sensitive to noise (high variance).

    • High K (e.g., 20) → oversmoothing, may miss local patterns (high bias).

  • Distance metric matters:

    • Euclidean for continuous features.

    • Manhattan if data has grid-like structure.

    • Can even use cosine similarity for text.

  • Feature scaling is crucial:

    • Since distances are sensitive to scale, normalize/standardize features first.


Visual Intuition (Good to Show Students)#

  1. Classification: draw two sets of points (red & blue). Add a new test point, show how its nearest neighbors decide its label.

  2. Regression: plot house size vs price, show how nearest neighbors’ average gives prediction.

  3. Distance metrics: show Euclidean (diagonal straight line) vs Manhattan (L-shaped path).

  4. Effect of K: with K=1 → decision boundary jagged; with K=15 → smooth.


In Short

  • KNN = predict based on neighbors.

  • Works for classification and regression.

  • Needs careful choice of K and distance metric.

  • Pros: simple, interpretable, no training time.

  • Cons: slow with large datasets, sensitive to irrelevant features, requires feature scaling.

  • Optimized by: k-d trees, ball trees, approximate nearest neighbors.

Click here for Sections