Cost Function#
Actually, K-Nearest Neighbors (KNN) doesn’t explicitly have a “cost function” like linear regression or SVM, because it is a non-parametric, instance-based learning algorithm. But we can still think about a related concept that measures “how well KNN is performing.”
1. KNN is Lazy Learning#
KNN does not train a model.
There are no parameters like weights to optimize via a cost function.
All computation happens at prediction time: distances are measured, neighbors are selected, and votes/averages are computed.
So unlike linear regression (minimizing squared error) or logistic regression (maximizing likelihood), KNN does not have a formal cost function during training.
2. Implicit “Cost” at Prediction#
We can think about KNN performance in terms of prediction error:
A. Classification#
The “cost” is related to misclassification:
Alternatively, weighted misclassification can be used if neighbors contribute differently (e.g., closer neighbors count more).
B. Regression#
The “cost” is related to the difference between predicted and true values:
Here, \(\hat{y}_i\) is the KNN prediction (mean of neighbors).
So for regression, you can think of MSE or MAE as the “implicit cost” that KNN is minimizing by choosing neighbors wisely.
3. Choosing k as Implicit Optimization#
Selecting k can be seen as optimizing the model to minimize prediction error.
Small \(k\) → low bias, high variance → sensitive to noise.
Large \(k\) → high bias, low variance → smoother prediction.
Cross-validation is used to find the k that minimizes validation error.
4. Optional Weighted Cost Function#
Some KNN variants use distance-weighted predictions:
Classification: closer neighbors get more weight in majority vote.
Regression: prediction is weighted average:
Here, \(d(x, x_i)\) is the distance to neighbor \(i\).
This is effectively minimizing weighted prediction error.
Summary
Aspect |
KNN Behavior |
|---|---|
Traditional cost function |
None (lazy learner) |
Implicit “cost” |
Misclassification (classification), MSE/MAE (regression) |
Optimization |
Choice of k and distance weighting |
Goal |
Minimize prediction error |