Assumptions

Assumptions #

Assumption: The distance metric you choose (Euclidean, Manhattan, Cosine, etc.) accurately reflects similarity between points.
Implication:
- If distances do not represent similarity well, the algorithm will merge or split clusters incorrectly.
- Example: Euclidean distance assumes continuous numerical features and is sensitive to scale.

Assumption: The data can be meaningfully represented in a nested, tree-like hierarchy.
Implication:
- HC works best if clusters are naturally nested.
- If data has flat clusters with no hierarchy, HC may produce arbitrary splits or merges.

Assumption: Points within a cluster are more similar to each other than to points in other clusters.
Implication:
- If clusters have very different densities or shapes, some linkage methods (like single or complete linkage) may fail.

Assumption: The chosen linkage method (single, complete, average, Ward) appropriately reflects inter-cluster distances.
Implication:
- Single linkage → can cause “chaining” effect (long, snake-like clusters).
- Complete linkage → favors compact clusters, may break elongated clusters.
- Ward → assumes variance minimization is meaningful.

Assumption: Features are comparable in scale or have been standardized.
Implication:
- If one feature dominates due to scale, distance metrics (especially Euclidean) will be biased, leading to poor clustering.

Assumption: Data is relatively clean; noise and outliers are minimal.
Implication:
- Outliers can create their own singleton clusters or distort dendrogram structure.

Summary

Assumption	Explanation
Distance metric is meaningful	Similar points are close, dissimilar points are far
Hierarchical structure exists	Data can be represented in nested clusters
Homogeneity within clusters	Points in a cluster are more similar to each other than to others
Linkage method reflects inter-cluster distance	Choice of single, complete, average, or Ward affects cluster shape
Features are scaled	Avoid one feature dominating the distance metric
Minimal noise/outliers	Outliers don’t distort hierarchy

Key Intuition

Hierarchical clustering is like building a tree of data points:

If the distances, linkage method, and feature scaling are appropriate → the tree accurately reflects the nested structure of the data.
Violating assumptions → dendrogram may be misleading, merges may be arbitrary, and resulting clusters may not be meaningful.