Performance Metrics

Performance Metrics #

Performance of DBSCAN is evaluated with cluster quality metrics, since it does not optimize an internal cost function.

These assess clustering quality based only on the dataset.

Silhouette Coefficient

\[ s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))} \]
- \(a(i)\): average distance of point \(i\) to points in its own cluster.
- \(b(i)\): average distance of point \(i\) to nearest other cluster.
- Range: \([-1, 1]\). Higher = better separation.
Davies–Bouldin Index (DBI)

\[ DBI = \frac{1}{k} \sum_{i=1}^k \max_{j \neq i} \frac{s_i + s_j}{d_{ij}} \]
- \(s_i\): average scatter within cluster \(i\).
- \(d_{ij}\): distance between cluster centers.
- Lower = better (compact and well-separated clusters).
Calinski–Harabasz Index

\[ CH = \frac{\text{Between-cluster variance} / (k-1)}{\text{Within-cluster variance} / (n-k)} \]

Higher = better separation.

Adjusted Rand Index (ARI) Measures agreement between predicted clusters and true labels.

\[ ARI = \frac{\text{Index} - \text{Expected Index}}{\text{Max Index} - \text{Expected Index}} \]

Range: \([-1, 1]\). 1 = perfect match, 0 = random, negative = worse than random.
Normalized Mutual Information (NMI) Based on information theory.

\[ NMI = \frac{2 \cdot I(Y;C)}{H(Y) + H(C)} \]

where \(I\) is mutual information, \(H\) entropy. Range: \([0,1]\). Higher = better.
Homogeneity, Completeness, V-measure
- Homogeneity: each cluster contains only members of one class.
- Completeness: all members of a class are in the same cluster.
- V-measure: harmonic mean of both.

Noise points: Many metrics (e.g., silhouette) ignore points labeled as noise (\(-1\)). This must be handled carefully.
Cluster count flexibility: DBSCAN may return different numbers of clusters depending on \(\epsilon\), minPts. Evaluation metrics help tune these parameters.