Assumptions#
1.1 Meaningful Distance Metric#
Assumption: The distance metric you choose (Euclidean, Manhattan, Cosine, etc.) accurately reflects similarity between points.
Implication:
If distances do not represent similarity well, the algorithm will merge or split clusters incorrectly.
Example: Euclidean distance assumes continuous numerical features and is sensitive to scale.
1.2 Clusters Have a Hierarchical/Nested Structure#
Assumption: The data can be meaningfully represented in a nested, tree-like hierarchy.
Implication:
HC works best if clusters are naturally nested.
If data has flat clusters with no hierarchy, HC may produce arbitrary splits or merges.
1.3 Homogeneity Within Clusters#
Assumption: Points within a cluster are more similar to each other than to points in other clusters.
Implication:
If clusters have very different densities or shapes, some linkage methods (like single or complete linkage) may fail.
1.4 Choice of Linkage Matters#
Assumption: The chosen linkage method (single, complete, average, Ward) appropriately reflects inter-cluster distances.
Implication:
Single linkage → can cause “chaining” effect (long, snake-like clusters).
Complete linkage → favors compact clusters, may break elongated clusters.
Ward → assumes variance minimization is meaningful.
1.5 Scale of Features#
Assumption: Features are comparable in scale or have been standardized.
Implication:
If one feature dominates due to scale, distance metrics (especially Euclidean) will be biased, leading to poor clustering.
1.6 No Strong Noise or Outliers#
Assumption: Data is relatively clean; noise and outliers are minimal.
Implication:
Outliers can create their own singleton clusters or distort dendrogram structure.
Summary
Assumption |
Explanation |
|---|---|
Distance metric is meaningful |
Similar points are close, dissimilar points are far |
Hierarchical structure exists |
Data can be represented in nested clusters |
Homogeneity within clusters |
Points in a cluster are more similar to each other than to others |
Linkage method reflects inter-cluster distance |
Choice of single, complete, average, or Ward affects cluster shape |
Features are scaled |
Avoid one feature dominating the distance metric |
Minimal noise/outliers |
Outliers don’t distort hierarchy |
Key Intuition
Hierarchical clustering is like building a tree of data points:
If the distances, linkage method, and feature scaling are appropriate → the tree accurately reflects the nested structure of the data.
Violating assumptions → dendrogram may be misleading, merges may be arbitrary, and resulting clusters may not be meaningful.