Hyperparameter Tuning#

KNN has no training phase in the usual sense, but it still has hyperparameters that control how predictions are made:

  1. k (number of neighbors) – how many nearby points influence the prediction

  2. Distance metric – how we measure “closeness” between points

    • Euclidean, Manhattan, Minkowski, Hamming, etc.

  3. Weights of neighbors – whether all neighbors contribute equally or closer ones count more

    • uniform: all neighbors have equal weight

    • distance: closer neighbors have higher influence

Hyperparameter tuning is the process of finding the combination of these hyperparameters that minimizes error (or maximizes accuracy).


2. Key Hyperparameters#

Hyperparameter

Description

Effect on model

n_neighbors (k)

Number of nearest neighbors

Small k → noisy predictions, overfit
Large k → smoother predictions, underfit

metric

Distance metric to calculate closeness

Changes neighbor selection → affects predictions

weights

Weighting of neighbors

Can improve performance by prioritizing closer points


3. Methods for Hyperparameter Tuning#




D. Cross-Validation#

  • Always combine tuning with cross-validation to avoid overfitting

  • Use k-fold CV (e.g., k=5) to evaluate each hyperparameter setting.


4. Intuition Behind Tuning k#

  • Small k:

    • Captures local patterns

    • Sensitive to noise → may misclassify outliers

  • Large k:

    • Smooths predictions

    • Ignores local patterns → may underfit

Optimal k is usually found by experimenting with validation scores or silhouette scores (for clustering).


5. Weighted vs Uniform Neighbors#

  • Uniform: all neighbors contribute equally

  • Distance: closer neighbors contribute more → often improves accuracy in noisy datasets

Intuition: nearer neighbors are more likely to be similar, so weighting helps KNN “trust” the right points.


6. Summary Workflow#

  1. Choose a range of hyperparameters (k, metric, weights)

  2. Split data (train/validation or use cross-validation)

  3. Evaluate performance for each combination

  4. Select best combination

  5. Retrain KNN on the full training set with these hyperparameters