K-Means Clustering

K-Means Clustering #

K-Means is one of the most popular unsupervised learning algorithms used for clustering.

It partitions a dataset into K clusters (groups).
Each cluster has a centroid (the “center” of the cluster).
Data points are assigned to the cluster whose centroid is closest (based on distance, usually Euclidean).

It’s like grouping students into K study groups based on how close their marks are in math and science.

The K-Means Algorithm (Step-by-Step)#

Choose K → the number of clusters you want.
Initialize centroids → randomly select K points as initial centroids.
Assign points to nearest centroid → each data point joins the cluster with the closest centroid (using Euclidean distance).
Update centroids → recalculate the centroid of each cluster as the mean of its points.
Repeat steps 3–4 until:
- Centroids stop changing significantly (convergence), or
- Max iterations reached.

Mathematical Intuition #

The objective function minimized by K-Means is the Within-Cluster Sum of Squares (WCSS):

\[ J = \sum_{i=1}^{K} \sum_{x \in C_i} \|x - \mu_i\|^2 \]

Where:

\(K\) = number of clusters
\(C_i\) = set of points in cluster \(i\)
\(\mu_i\) = centroid of cluster \(i\)

👉 Goal: minimize distance of all points from their cluster centroids.

Choosing the Right K (Number of Clusters)#

Elbow Method → plot WCSS vs. K, look for an “elbow” point.
Silhouette Score → measures how similar a point is to its own cluster vs. others.
Gap Statistics → compares clustering performance against random data.

Advantages #

✅ Simple and easy to implement. ✅ Works well with large datasets. ✅ Converges quickly.

❌ Must pre-define K. ❌ Sensitive to initialization (bad initial centroids → poor results). ❌ Struggles with non-spherical clusters or clusters of different densities. ❌ Sensitive to outliers.

Example Use Cases #

Customer segmentation in marketing.
Document or image clustering.
Anomaly detection.
Grouping genes with similar expression patterns in biology.

Quick Intuition Example: Imagine you have data on people’s height and weight. If you run K-Means with \(K=3\):

Cluster 1 might group short & light people.
Cluster 2 might group medium height & medium weight people.
Cluster 3 might group tall & heavy people.

The algorithm finds these groups automatically without labels.

K-Means Clustering

Contents

K-Means Clustering #

The K-Means Algorithm (Step-by-Step)#

Mathematical Intuition #

Choosing the Right K (Number of Clusters)#

Advantages #

Limitations #

Example Use Cases #

K-Means Clustering

Contents

K-Means Clustering#

The K-Means Algorithm (Step-by-Step)#

Mathematical Intuition#

Choosing the Right K (Number of Clusters)#

Advantages#

Limitations#

Example Use Cases#

K-Means Clustering #

Mathematical Intuition #

Advantages #

Limitations #

Example Use Cases #