Intiution#

  1. Core idea:

    • A cluster is a region where points are packed close together.

    • If enough points fall within a small radius (\(\epsilon\)), that region is dense → becomes a cluster center.

  2. Core points vs noise:

    • A core point has at least minPts neighbors within \(\epsilon\).

    • Border points lie near cores but don’t have enough neighbors themselves.

    • Isolated points with no dense neighbors are labeled noise.

  3. Growth of clusters:

    • Start from a core point.

    • Pull in all points density-reachable from it.

    • Expand outward until no more density-connected points exist.

    • This process naturally captures arbitrary shapes.

  4. Why it works:

    • Dense regions = true structure.

    • Sparse regions = separators between clusters.

    • Unlike K-Means, DBSCAN does not force spherical shapes or require cluster count in advance.

  5. Mental image: Imagine pouring ink drops on a scatterplot. Each drop spreads until it thins out. Thick patches form clusters, thin splatters are noise.

1. Neighborhood definition#

For any point \(p \in D\):

\[ N_\epsilon(p) = \{ q \in D \mid \text{dist}(p, q) \leq \epsilon \} \]

This is the set of neighbors within radius \(\epsilon\).


2. Core, Border, Noise#

  • Core point:

\[ |N_\epsilon(p)| \geq \text{minPts} \]

Meaning the neighborhood of \(p\) has enough points to be considered dense.

  • Border point:

\[ |N_\epsilon(p)| < \text{minPts}, \quad \exists \ q \in D \ \text{such that } p \in N_\epsilon(q) \ \text{and q is core.} \]
  • Noise point: Neither core nor border.


3. Density Reachability#

  • A point \(q\) is directly density-reachable from \(p\) if:

    1. \(q \in N_\epsilon(p)\)

    2. \(p\) is a core point.

  • Density-reachable: If there exists a chain of points \(p_1, p_2, \dots, p_k\) such that each \(p_{i+1}\) is directly density-reachable from \(p_i\).

  • Density-connected: Two points \(p\) and \(q\) are density-connected if there exists a point \(o\) such that both \(p\) and \(q\) are density-reachable from \(o\).


4. Cluster definition in DBSCAN#

A cluster \(C\) satisfies:

  1. Maximality: If \(p \in C\) and \(q\) is density-reachable from \(p\), then \(q \in C\).

  2. Connectivity: Any two points \(p, q \in C\) are density-connected.

So mathematically, DBSCAN partitions dataset \(D\) into subsets of mutually density-connected points.


5. Intuition vs Math#

  • Intuitively: grow clusters from dense seeds.

  • Mathematically: clusters = equivalence classes of density-connected points.

  • Noise = points not belonging to any cluster.