Assumptions#

1.1 Meaningful Distance Metric#

  • Assumption: The distance metric you choose (Euclidean, Manhattan, Cosine, etc.) accurately reflects similarity between points.

  • Implication:

    • If distances do not represent similarity well, the algorithm will merge or split clusters incorrectly.

    • Example: Euclidean distance assumes continuous numerical features and is sensitive to scale.


1.2 Clusters Have a Hierarchical/Nested Structure#

  • Assumption: The data can be meaningfully represented in a nested, tree-like hierarchy.

  • Implication:

    • HC works best if clusters are naturally nested.

    • If data has flat clusters with no hierarchy, HC may produce arbitrary splits or merges.


1.3 Homogeneity Within Clusters#

  • Assumption: Points within a cluster are more similar to each other than to points in other clusters.

  • Implication:

    • If clusters have very different densities or shapes, some linkage methods (like single or complete linkage) may fail.


1.4 Choice of Linkage Matters#

  • Assumption: The chosen linkage method (single, complete, average, Ward) appropriately reflects inter-cluster distances.

  • Implication:

    • Single linkage → can cause “chaining” effect (long, snake-like clusters).

    • Complete linkage → favors compact clusters, may break elongated clusters.

    • Ward → assumes variance minimization is meaningful.


1.5 Scale of Features#

  • Assumption: Features are comparable in scale or have been standardized.

  • Implication:

    • If one feature dominates due to scale, distance metrics (especially Euclidean) will be biased, leading to poor clustering.


1.6 No Strong Noise or Outliers#

  • Assumption: Data is relatively clean; noise and outliers are minimal.

  • Implication:

    • Outliers can create their own singleton clusters or distort dendrogram structure.


Summary

Assumption

Explanation

Distance metric is meaningful

Similar points are close, dissimilar points are far

Hierarchical structure exists

Data can be represented in nested clusters

Homogeneity within clusters

Points in a cluster are more similar to each other than to others

Linkage method reflects inter-cluster distance

Choice of single, complete, average, or Ward affects cluster shape

Features are scaled

Avoid one feature dominating the distance metric

Minimal noise/outliers

Outliers don’t distort hierarchy


Key Intuition

Hierarchical clustering is like building a tree of data points:

  • If the distances, linkage method, and feature scaling are appropriate → the tree accurately reflects the nested structure of the data.

  • Violating assumptions → dendrogram may be misleading, merges may be arbitrary, and resulting clusters may not be meaningful.