Assumptions#
1. Linearity#
Assumption: PCA assumes that the relationships between variables are linear.
Implication: PCA finds a linear combination of features that maximizes variance. Non-linear relationships may not be captured effectively.
Example: If your data lies on a curved manifold (like a circle), PCA will try to fit straight axes and fail to capture the true structure.
2. Large Variance = Important Structure#
Assumption: Features with larger variance are more “informative”.
Implication: PCA identifies directions (principal components) that maximize variance. It assumes that the most important underlying structure corresponds to directions of high variance.
Caveat: Noisy features with high variance might be misleadingly considered important.
3. Mean-Centered Data#
Assumption: PCA assumes the data is centered around zero.
Implication: Before computing the covariance matrix, you subtract the mean of each feature. If you skip this step, the first principal component might just point to the mean instead of the direction of maximal variance.
4. Orthogonality of Principal Components#
Assumption: The principal components are uncorrelated (orthogonal).
Implication: Each component captures a new direction of variance not explained by previous components. PCA cannot capture correlated patterns along non-orthogonal directions beyond linear correlation.
5. Scale Matters#
Assumption: The scale of variables affects PCA.
Implication: PCA is sensitive to the relative scaling of features. Standardizing features (e.g., z-score normalization) is usually recommended, especially if features have different units or ranges.
6. Noise is Homoscedastic (Optional)#
Assumption (ideal case): The noise in the data is isotropic (same in all directions).
Implication: PCA works best when the variance due to noise is roughly the same across all dimensions. If one feature has very high noise variance, PCA might consider it “important” incorrectly.
Summary Table#
Assumption |
Why It Matters |
|---|---|
Linearity |
PCA captures linear patterns only |
Large variance = important |
High-variance directions are assumed informative |
Mean-centered data |
Ensures components point along variance, not mean |
Orthogonality of components |
Each PC captures unique, uncorrelated variance |
Scale sensitivity |
Features need normalization if scales differ |
Homoscedastic noise (optional) |
Unequal noise can distort principal components |
Intuition: PCA is essentially like finding the “best-fit axes” through your data cloud. The assumptions above are what make that “best-fit” meaningful. If your data violates these, PCA may give misleading directions.