Performance Metrics#

Performance of DBSCAN is evaluated with cluster quality metrics, since it does not optimize an internal cost function.


1. Internal evaluation metrics (no ground truth required)#

These assess clustering quality based only on the dataset.

  • Silhouette Coefficient

    \[ s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))} \]
    • \(a(i)\): average distance of point \(i\) to points in its own cluster.

    • \(b(i)\): average distance of point \(i\) to nearest other cluster.

    • Range: \([-1, 1]\). Higher = better separation.

  • Davies–Bouldin Index (DBI)

    \[ DBI = \frac{1}{k} \sum_{i=1}^k \max_{j \neq i} \frac{s_i + s_j}{d_{ij}} \]
    • \(s_i\): average scatter within cluster \(i\).

    • \(d_{ij}\): distance between cluster centers.

    • Lower = better (compact and well-separated clusters).

  • Calinski–Harabasz Index

    \[ CH = \frac{\text{Between-cluster variance} / (k-1)}{\text{Within-cluster variance} / (n-k)} \]

    Higher = better separation.


2. External evaluation metrics (ground truth available)#

  • Adjusted Rand Index (ARI) Measures agreement between predicted clusters and true labels.

    \[ ARI = \frac{\text{Index} - \text{Expected Index}}{\text{Max Index} - \text{Expected Index}} \]

    Range: \([-1, 1]\). 1 = perfect match, 0 = random, negative = worse than random.

  • Normalized Mutual Information (NMI) Based on information theory.

    \[ NMI = \frac{2 \cdot I(Y;C)}{H(Y) + H(C)} \]

    where \(I\) is mutual information, \(H\) entropy. Range: \([0,1]\). Higher = better.

  • Homogeneity, Completeness, V-measure

    • Homogeneity: each cluster contains only members of one class.

    • Completeness: all members of a class are in the same cluster.

    • V-measure: harmonic mean of both.


3. DBSCAN-specific considerations#

  • Noise points: Many metrics (e.g., silhouette) ignore points labeled as noise (\(-1\)). This must be handled carefully.

  • Cluster count flexibility: DBSCAN may return different numbers of clusters depending on \(\epsilon\), minPts. Evaluation metrics help tune these parameters.