Cost Function

Cost Function #

Actually, K-Nearest Neighbors (KNN) doesn’t explicitly have a “cost function” like linear regression or SVM, because it is a non-parametric, instance-based learning algorithm. But we can still think about a related concept that measures “how well KNN is performing.”

1. KNN is Lazy Learning#

KNN does not train a model.
There are no parameters like weights to optimize via a cost function.
All computation happens at prediction time: distances are measured, neighbors are selected, and votes/averages are computed.

So unlike linear regression (minimizing squared error) or logistic regression (maximizing likelihood), KNN does not have a formal cost function during training.

2. Implicit “Cost” at Prediction#

We can think about KNN performance in terms of prediction error:

A. Classification#

The “cost” is related to misclassification:

\[ \text{Error rate} = \frac{\text{Number of misclassified points}}{\text{Total points}} \]

Alternatively, weighted misclassification can be used if neighbors contribute differently (e.g., closer neighbors count more).

B. Regression#

The “cost” is related to the difference between predicted and true values:

\[ \text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 \]

Here, \(\hat{y}_i\) is the KNN prediction (mean of neighbors).
So for regression, you can think of MSE or MAE as the “implicit cost” that KNN is minimizing by choosing neighbors wisely.

3. Choosing k as Implicit Optimization#

Selecting k can be seen as optimizing the model to minimize prediction error.
Small \(k\) → low bias, high variance → sensitive to noise.
Large \(k\) → high bias, low variance → smoother prediction.
Cross-validation is used to find the k that minimizes validation error.

4. Optional Weighted Cost Function#

Some KNN variants use distance-weighted predictions:

Classification: closer neighbors get more weight in majority vote.
Regression: prediction is weighted average:

\[ \hat{y} = \frac{\sum_{i=1}^k w_i y_i}{\sum_{i=1}^k w_i}, \quad w_i = \frac{1}{d(x, x_i)} \]

Here, \(d(x, x_i)\) is the distance to neighbor \(i\).
This is effectively minimizing weighted prediction error.

Summary

Aspect	KNN Behavior
Traditional cost function	None (lazy learner)
Implicit “cost”	Misclassification (classification), MSE/MAE (regression)
Optimization	Choice of k and distance weighting
Goal	Minimize prediction error