Hierarchical Clustering#

Hierarchical clustering is an unsupervised machine learning algorithm used to group similar data points into clusters.

  • Unlike K-Means, you don’t need to specify the number of clusters upfront.

  • It builds a hierarchy of clusters, often visualized as a dendrogram.

There are two main types:

  1. Agglomerative (Bottom-Up) – most common

    • Start with each data point as its own cluster.

    • Iteratively merge the two closest clusters until all points are in one cluster.

  2. Divisive (Top-Down)

    • Start with all points in one cluster.

    • Recursively split clusters until each cluster contains a single point.

Agglomerative is far more commonly used in practice.


Workflow of Agglomerative Clustering#

  1. Compute Distance Matrix

    • Calculate pairwise distances between all points using a metric (Euclidean, Manhattan, Cosine, etc.).

  2. Linkage Criteria

    • Decide how to measure distance between clusters:

      • Single linkage: distance between closest points in clusters

      • Complete linkage: distance between farthest points in clusters

      • Average linkage: average distance between points in clusters

      • Ward’s method: minimize variance within merged clusters

  3. Merge Clusters Iteratively

    • Merge the two clusters with the smallest distance according to the chosen linkage.

    • Update the distance matrix after each merge.

  4. Build the Dendrogram

    • Dendrogram is a tree-like diagram showing cluster merges at different distances.

    • The height of the merge represents distance between clusters.

  5. Cut the Dendrogram to Form Clusters

    • Decide the number of clusters by choosing a height to cut the dendrogram.

    • Clusters below that cut are considered separate.


Distance Metrics and Linkage Methods#

Metric

Description

Euclidean

Straight-line distance

Manhattan

Sum of absolute differences

Cosine

Angle between vectors

Correlation

1 - correlation coefficient

Linkage

Description

Single

Minimum distance between points of clusters

Complete

Maximum distance between points of clusters

Average

Average distance between all points

Ward

Merge clusters to minimize within-cluster variance

Ward linkage + Euclidean distance is commonly used because it tends to produce clusters of similar size and reduces variance.


Advantages of Hierarchical Clustering#

  • No need to pre-specify the number of clusters.

  • Produces dendrogram, which is interpretable.

  • Can capture nested clusters (hierarchical relationships).


Limitations#

  • Computationally expensive for large datasets (O(n²) time and memory).

  • Sensitive to noise and outliers.

  • Choice of distance metric and linkage affects results.


Intuition#

  • Think of data points as leaves of a tree:

    • Agglomerative HC gradually merges leaves into branches → branches into bigger branches → one tree.

  • Cutting the tree at a certain height gives you clusters.

Click here for Sections