DBSCAN

DBSCAN #

DBSCAN = Density-Based Spatial Clustering of Applications with Noise. It is a clustering algorithm that groups together points that are closely packed and marks points in sparse regions as noise (outliers).

Core Intuition#

Instead of assuming clusters are spherical (like K-Means), DBSCAN finds clusters based on density.

Dense regions = clusters.
Sparse regions = noise.
Works well for arbitrary-shaped clusters.

Key Concepts#

Parameters:
- \(\epsilon\) (eps): radius of neighborhood.
- \(\text{minPts}\): minimum number of points required to form a dense region.
Point types:
- Core point: has at least minPts neighbors within distance \(\epsilon\).
- Border point: not a core, but lies within \(\epsilon\) of a core point.
- Noise point: neither core nor border.
Density reachability:
- A point \(p\) is directly density-reachable from \(q\) if \(p\) is within \(\epsilon\) of \(q\) and \(q\) is a core point.
- Density-connected: two points are density-connected if there is a chain of density-reachable points linking them.

Workflow#

Pick an unvisited point.
If it is a core point → start a new cluster.
- Collect all density-reachable points into this cluster.
If it is a border point → assign it to a nearby cluster (if possible).
If it is noise → mark it as outlier.
Repeat until all points are visited.