Workflows

Workflows #

1. Initialization#

Choose the number of clusters (K) beforehand.
Randomly initialize K centroids (either by randomly selecting K data points or using smarter methods like K-Means++).

2. Assignment Step (Cluster Assignment)#

For each data point:
- Compute the distance (usually Euclidean) to each centroid.
- Assign the point to the cluster with the nearest centroid.

👉 After this step, every point belongs to exactly one cluster.

3. Update Step (Recompute Centroids)#

For each cluster:
- Recalculate the centroid as the mean of all points assigned to that cluster.

👉 Centroids move toward the “center of mass” of their assigned points.

4. Iteration (Repeat Assignment + Update)#

Repeat Step 2 and Step 3 until one of the following conditions is met:
- Centroids no longer move significantly (convergence).
- Cluster assignments do not change.
- Maximum number of iterations is reached.

5. Termination (Final Clusters)#

Once convergence is reached, the final set of clusters is obtained.
The algorithm outputs:
- The cluster assignments for each data point.
- The final centroids.

Workflow Visualization #

Initialization → Assignment → Update → Repeat → Convergence

Example (with K=3):

Randomly pick 3 centroids.
Assign points to nearest centroid.
Move centroids to mean of assigned points.
Repeat steps 2–3 until stable clusters emerge.

In short

K-Means is an iterative refinement process: Guess centroids → Assign points → Adjust centroids → Repeat until stable.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

# Generate synthetic dataset
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.6, random_state=42)

# Store intermediate results manually for visualization
kmeans = KMeans(n_clusters=4, init='random', n_init=1, max_iter=1, random_state=42)
kmeans.fit(X)
centroids_0 = kmeans.cluster_centers_
labels_0 = kmeans.labels_

kmeans = KMeans(n_clusters=4, init=centroids_0, n_init=1, max_iter=2, random_state=42)
kmeans.fit(X)
centroids_1 = kmeans.cluster_centers_
labels_1 = kmeans.labels_

kmeans = KMeans(n_clusters=4, init=centroids_0, n_init=1, max_iter=10, random_state=42)
kmeans.fit(X)
centroids_final = kmeans.cluster_centers_
labels_final = kmeans.labels_

# Plot subplots for workflow visualization
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Iteration 0 (random initialization)
axes[0].scatter(X[:, 0], X[:, 1], c=labels_0, cmap='viridis', s=30)
axes[0].scatter(centroids_0[:, 0], centroids_0[:, 1], c='red', marker='X', s=200, label="Centroids")
axes[0].set_title("Iteration 0: Random Initialization")
axes[0].legend()

# Iteration 2 (after assignment + update)
axes[1].scatter(X[:, 0], X[:, 1], c=labels_1, cmap='viridis', s=30)
axes[1].scatter(centroids_1[:, 0], centroids_1[:, 1], c='red', marker='X', s=200, label="Centroids")
axes[1].set_title("Iteration 2: Updating Centroids")
axes[1].legend()

# Final iteration (convergence)
axes[2].scatter(X[:, 0], X[:, 1], c=labels_final, cmap='viridis', s=30)
axes[2].scatter(centroids_final[:, 0], centroids_final[:, 1], c='red', marker='X', s=200, label="Centroids")
axes[2].set_title("Final Iteration: Convergence")
axes[2].legend()

plt.tight_layout()
plt.show()

../../../_images/9928012ccabee3d64ff0aa707f48c21b2e7bd731f30b5735e712ef4e6eaf7b54.png

Here’s a step-by-step workflow demonstration of K-Means Clustering:

Random Initialization → Centroids are placed randomly (Iteration 0).
Assignment Step → Each point is assigned to the nearest centroid.
Update Step → New centroids are computed as the mean of the assigned points.
Repeat → Assignment + Update steps continue until centroids stop moving significantly (convergence).

The plots above show how centroids move and clusters form across iterations — from random initialization → refined clusters → stable convergence.