Workflows

Workflows #

Before applying KNN:

Collect data: You need labeled training data for classification or regression.
Feature scaling: KNN relies on distance metrics, so features should be on the same scale. Use:
- Min-Max Scaling
- Standardization (Z-score)

Why scaling matters: Without scaling, a feature with a larger range will dominate the distance calculation.

Decide how to measure “closeness” between points. Common choices:

Metric	Formula (2D example)	When to Use
Euclidean	\(\sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2}\)	Most common, continuous features
Manhattan	(	x_1 - y_1	+	x_2 - y_2	)	Grid-like distances, discrete features
Minkowski	Generalization of Euclidean & Manhattan	Flexibility with parameter `p`
Hamming	Counts different features	Categorical variables

Decide how many neighbors to consider:

For each new data point \(x_{\text{new}}\):

Pick the top k closest points from the sorted distance list.
These points “vote” for the label (classification) or contribute to the average (regression).

Classification: Majority vote determines the predicted class.
- Optional: weighted vote (closer neighbors count more).
Regression: Take the mean (or weighted mean) of neighbors’ values.

Use performance metrics:
- Classification: Accuracy, Precision, Recall, F1-score, Confusion Matrix
- Regression: MSE, RMSE, MAE, R²
Optionally, tune k and/or distance metric to improve results.

Summary

Prepare data → 2. Scale features → 3. Select k & distance metric →
Compute distances → 5. Find nearest neighbors → 6. Aggregate results → 7. Predict output → 8. Evaluate & tune