Intiution#
Core idea:
A cluster is a region where points are packed close together.
If enough points fall within a small radius (\(\epsilon\)), that region is dense → becomes a cluster center.
Core points vs noise:
A core point has at least
minPtsneighbors within \(\epsilon\).Border points lie near cores but don’t have enough neighbors themselves.
Isolated points with no dense neighbors are labeled noise.
Growth of clusters:
Start from a core point.
Pull in all points density-reachable from it.
Expand outward until no more density-connected points exist.
This process naturally captures arbitrary shapes.
Why it works:
Dense regions = true structure.
Sparse regions = separators between clusters.
Unlike K-Means, DBSCAN does not force spherical shapes or require cluster count in advance.
Mental image: Imagine pouring ink drops on a scatterplot. Each drop spreads until it thins out. Thick patches form clusters, thin splatters are noise.
1. Neighborhood definition#
For any point \(p \in D\):
This is the set of neighbors within radius \(\epsilon\).
2. Core, Border, Noise#
Core point:
Meaning the neighborhood of \(p\) has enough points to be considered dense.
Border point:
Noise point: Neither core nor border.
3. Density Reachability#
A point \(q\) is directly density-reachable from \(p\) if:
\(q \in N_\epsilon(p)\)
\(p\) is a core point.
Density-reachable: If there exists a chain of points \(p_1, p_2, \dots, p_k\) such that each \(p_{i+1}\) is directly density-reachable from \(p_i\).
Density-connected: Two points \(p\) and \(q\) are density-connected if there exists a point \(o\) such that both \(p\) and \(q\) are density-reachable from \(o\).
4. Cluster definition in DBSCAN#
A cluster \(C\) satisfies:
Maximality: If \(p \in C\) and \(q\) is density-reachable from \(p\), then \(q \in C\).
Connectivity: Any two points \(p, q \in C\) are density-connected.
So mathematically, DBSCAN partitions dataset \(D\) into subsets of mutually density-connected points.
5. Intuition vs Math#
Intuitively: grow clusters from dense seeds.
Mathematically: clusters = equivalence classes of density-connected points.
Noise = points not belonging to any cluster.