Cost Function#

Actually, K-Nearest Neighbors (KNN) doesn’t explicitly have a “cost function” like linear regression or SVM, because it is a non-parametric, instance-based learning algorithm. But we can still think about a related concept that measures “how well KNN is performing.”


1. KNN is Lazy Learning#

  • KNN does not train a model.

  • There are no parameters like weights to optimize via a cost function.

  • All computation happens at prediction time: distances are measured, neighbors are selected, and votes/averages are computed.

So unlike linear regression (minimizing squared error) or logistic regression (maximizing likelihood), KNN does not have a formal cost function during training.


2. Implicit “Cost” at Prediction#

We can think about KNN performance in terms of prediction error:

A. Classification#

  • The “cost” is related to misclassification:

\[ \text{Error rate} = \frac{\text{Number of misclassified points}}{\text{Total points}} \]
  • Alternatively, weighted misclassification can be used if neighbors contribute differently (e.g., closer neighbors count more).

B. Regression#

  • The “cost” is related to the difference between predicted and true values:

\[ \text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 \]
  • Here, \(\hat{y}_i\) is the KNN prediction (mean of neighbors).

  • So for regression, you can think of MSE or MAE as the “implicit cost” that KNN is minimizing by choosing neighbors wisely.


3. Choosing k as Implicit Optimization#

  • Selecting k can be seen as optimizing the model to minimize prediction error.

  • Small \(k\) → low bias, high variance → sensitive to noise.

  • Large \(k\) → high bias, low variance → smoother prediction.

  • Cross-validation is used to find the k that minimizes validation error.


4. Optional Weighted Cost Function#

Some KNN variants use distance-weighted predictions:

  • Classification: closer neighbors get more weight in majority vote.

  • Regression: prediction is weighted average:

\[ \hat{y} = \frac{\sum_{i=1}^k w_i y_i}{\sum_{i=1}^k w_i}, \quad w_i = \frac{1}{d(x, x_i)} \]
  • Here, \(d(x, x_i)\) is the distance to neighbor \(i\).

  • This is effectively minimizing weighted prediction error.


Summary

Aspect

KNN Behavior

Traditional cost function

None (lazy learner)

Implicit “cost”

Misclassification (classification), MSE/MAE (regression)

Optimization

Choice of k and distance weighting

Goal

Minimize prediction error