Performance Metrics#
Performance of DBSCAN is evaluated with cluster quality metrics, since it does not optimize an internal cost function.
1. Internal evaluation metrics (no ground truth required)#
These assess clustering quality based only on the dataset.
Silhouette Coefficient
\[ s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))} \]\(a(i)\): average distance of point \(i\) to points in its own cluster.
\(b(i)\): average distance of point \(i\) to nearest other cluster.
Range: \([-1, 1]\). Higher = better separation.
Davies–Bouldin Index (DBI)
\[ DBI = \frac{1}{k} \sum_{i=1}^k \max_{j \neq i} \frac{s_i + s_j}{d_{ij}} \]\(s_i\): average scatter within cluster \(i\).
\(d_{ij}\): distance between cluster centers.
Lower = better (compact and well-separated clusters).
Calinski–Harabasz Index
\[ CH = \frac{\text{Between-cluster variance} / (k-1)}{\text{Within-cluster variance} / (n-k)} \]Higher = better separation.
2. External evaluation metrics (ground truth available)#
Adjusted Rand Index (ARI) Measures agreement between predicted clusters and true labels.
\[ ARI = \frac{\text{Index} - \text{Expected Index}}{\text{Max Index} - \text{Expected Index}} \]Range: \([-1, 1]\). 1 = perfect match, 0 = random, negative = worse than random.
Normalized Mutual Information (NMI) Based on information theory.
\[ NMI = \frac{2 \cdot I(Y;C)}{H(Y) + H(C)} \]where \(I\) is mutual information, \(H\) entropy. Range: \([0,1]\). Higher = better.
Homogeneity, Completeness, V-measure
Homogeneity: each cluster contains only members of one class.
Completeness: all members of a class are in the same cluster.
V-measure: harmonic mean of both.
3. DBSCAN-specific considerations#
Noise points: Many metrics (e.g., silhouette) ignore points labeled as noise (\(-1\)). This must be handled carefully.
Cluster count flexibility: DBSCAN may return different numbers of clusters depending on \(\epsilon\), minPts. Evaluation metrics help tune these parameters.