Intiution#

1. Core Idea#

t-SNE is all about preserving local neighborhoods.

  • Imagine you have a high-dimensional dataset.

  • You want a 2D or 3D plot that reflects how points relate to each other locally.

  • t-SNE does this by modeling the probability that two points are neighbors and trying to preserve that in the low-dimensional embedding.


2. Step-by-Step Intuition#

Step 1: Compute Similarities in High-Dimensional Space#

  • For each point, t-SNE computes how similar it is to every other point using conditional probabilities:

\[ p_{j|i} = \frac{\exp(-||x_i - x_j||^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-||x_i - x_k||^2 / 2\sigma_i^2)} \]
  • Interpretation:

    • If two points are close in high-D space → probability \(p_{ij}\) is high.

    • If far apart → \(p_{ij}\) is low.

  • This captures local neighborhood structure.


Step 2: Map to Low-Dimensional Space#

  • t-SNE places points in 2D/3D randomly at first.

  • Then, it defines low-dimensional similarities using a heavy-tailed Student-t distribution:

\[ q_{ij} = \frac{(1 + ||y_i - y_j||^2)^{-1}}{\sum_{k \neq l} (1 + ||y_k - y_l||^2)^{-1}} \]
  • Why t-distribution?

    • Avoids “crowding problem.”

    • Faraway points in high-D space can be pushed apart in 2D.


Step 3: Minimize Difference Between High-D and Low-D#

  • t-SNE minimizes the Kullback-Leibler (KL) divergence between high-D and low-D similarities:

\[ \text{KL}(P || Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}} \]
  • Interpretation:

    • Low KL → points that were close remain close.

    • Points that were distant in high-D space may be pushed farther apart in 2D.


3. Visualization Analogy#

  • Imagine a sheet of paper representing low-D space.

  • High-dimensional points are connected by springs to neighbors with strengths proportional to similarity.

  • t-SNE moves the points around so that the spring system is relaxed, preserving local neighborhoods while allowing distant points to spread out.


4. Key Takeaways#

Concept

Intuition

High-D similarity

“Which points are neighbors?”

Low-D similarity

“Place neighbors close, others far”

KL divergence

“Minimize mismatch between high-D and low-D neighbors”

Heavy-tailed distribution

“Prevent crowding, let distant points stretch”


Summary

  • t-SNE does not preserve global distances.

  • It highlights clusters and local relationships.

  • Best used for visualizing patterns rather than downstream modeling.