Intiution

Intiution #

t-SNE is all about preserving local neighborhoods.

Imagine you have a high-dimensional dataset.
You want a 2D or 3D plot that reflects how points relate to each other locally.
t-SNE does this by modeling the probability that two points are neighbors and trying to preserve that in the low-dimensional embedding.

For each point, t-SNE computes how similar it is to every other point using conditional probabilities:

\[ p_{j|i} = \frac{\exp(-||x_i - x_j||^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-||x_i - x_k||^2 / 2\sigma_i^2)} \]

Interpretation:
- If two points are close in high-D space → probability \(p_{ij}\) is high.
- If far apart → \(p_{ij}\) is low.
This captures local neighborhood structure.

t-SNE places points in 2D/3D randomly at first.
Then, it defines low-dimensional similarities using a heavy-tailed Student-t distribution:

\[ q_{ij} = \frac{(1 + ||y_i - y_j||^2)^{-1}}{\sum_{k \neq l} (1 + ||y_k - y_l||^2)^{-1}} \]

Why t-distribution?
- Avoids “crowding problem.”
- Faraway points in high-D space can be pushed apart in 2D.

t-SNE minimizes the Kullback-Leibler (KL) divergence between high-D and low-D similarities:

\[ \text{KL}(P || Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}} \]

Interpretation:
- Low KL → points that were close remain close.
- Points that were distant in high-D space may be pushed farther apart in 2D.

Imagine a sheet of paper representing low-D space.
High-dimensional points are connected by springs to neighbors with strengths proportional to similarity.
t-SNE moves the points around so that the spring system is relaxed, preserving local neighborhoods while allowing distant points to spread out.

Concept	Intuition
High-D similarity	“Which points are neighbors?”
Low-D similarity	“Place neighbors close, others far”
KL divergence	“Minimize mismatch between high-D and low-D neighbors”
Heavy-tailed distribution	“Prevent crowding, let distant points stretch”

Summary