Correlation Analysis#
Continuous X vs Y#
X (Feature) |
Y (Target) |
Best Methods |
|---|---|---|
Continuous |
Continuous |
Pearson, Spearman, Kendall, Distance Correlation, Mutual Information, Partial Correlation, MIC |
Continuous |
Binary |
Point-Biserial, Logistic Coefficient, AUC, Mutual Information |
Continuous |
Ordinal |
Spearman, Kendall, Mutual Information |
Continuous |
Categorical (Nominal) |
ANOVA F-test, Kruskal–Wallis, Mutual Information |
Continuous |
Discrete Numeric |
Pearson, Spearman, Kendall, MI |
Discrete Numeric X vs Y#
X |
Y |
Best Methods |
|---|---|---|
Discrete Numeric |
Continuous |
t-test (if 2 groups), ANOVA (k groups), Kruskal–Wallis, Point-Biserial (if X is effectively binary), Spearman (if numeric rank-like), MI |
Discrete Numeric |
Binary |
Point-Biserial (if X binary), Spearman (if ordinal meaning), t-test, Mann–Whitney, Mutual Information |
Discrete Numeric |
Ordinal |
Spearman, Kendall, Mutual Information |
Discrete Numeric |
Categorical Nominal |
ANOVA, Kruskal–Wallis, Mutual Information |
Discrete Numeric |
Discrete Numeric |
Spearman, Kendall, Chi-square (if frequencies), Mutual Information |
Binary X vs Y#
X |
Y |
Best Methods |
|---|---|---|
Binary |
Continuous |
Point-Biserial, t-test, Mann–Whitney, Mutual Information |
Binary |
Binary |
Phi coefficient, Chi-square, Mutual Information |
Binary |
Ordinal |
Spearman, Kendall, Mutual Information |
Binary |
Categorical Nominal |
Chi-square, Cramér’s V, Mutual Information |
Binary |
Discrete Numeric |
Spearman, Point-Biserial, t-test, MI |
Ordinal X vs Y#
X |
Y |
Best Methods |
|---|---|---|
Ordinal |
Continuous |
Spearman, Kendall, Mutual Information |
Ordinal |
Binary |
Point-Biserial (if ordinal reduces to 2), Spearman, Kendall, MI |
Ordinal |
Ordinal |
Spearman, Kendall, Mutual Information |
Ordinal |
Categorical Nominal |
Chi-square, Cramér’s V, Mutual Information |
Ordinal |
Discrete Numeric |
Spearman, Kendall, MI |
Categorical (Nominal) X vs Y#
X |
Y |
Best Methods |
|---|---|---|
Categorical |
Continuous |
ANOVA, Kruskal–Wallis, Eta-squared, Mutual Information |
Categorical |
Binary |
Chi-square, Cramér’s V, Mutual Information |
Categorical |
Ordinal |
Chi-square, Gamma, Cramér’s V, Kendall, MI |
Categorical |
Categorical |
Chi-square, Cramér’s V, Phi (2×2), Mutual Information |
Categorical |
Discrete Numeric |
ANOVA, Kruskal–Wallis, MI |
Summary
X Type |
Y Type |
Recommended Techniques |
|---|---|---|
Continuous → Continuous |
Pearson, Spearman, Kendall, Distance Corr, Partial Corr, MI |
|
Continuous → Discrete Ordinal |
Spearman, Kendall, MI |
|
Continuous → Discrete Categorical |
ANOVA, Kruskal–Wallis, MI |
|
Continuous → Binary |
Point-Biserial, AUC, Logistic β, MI |
|
Discrete → Continuous |
t-test, ANOVA, Kruskal–Wallis, Spearman, MI |
|
Discrete → Discrete |
Chi-square, Cramér’s V, Phi, MI |
|
Ordinal ↔ Ordinal |
Spearman, Kendall, MI |
|
Binary ↔ Binary |
Phi, Chi-square, MI |
|
Nominal ↔ Nominal |
Chi-square, Cramér’s V, MI |
LINEAR CORRELATION METHODS (Continuous ↔ Continuous)#
These methods evaluate linear dependence, meaning how closely data points fit a straight line.
Pearson Correlation (Linear Correlation)#
Definition
Measures the strength and direction of linear relationship between two continuous variables X and Y.
Value ranges from:
+1 → perfect positive linear relation
−1 → perfect negative linear relation
0 → no linear relation
Assumptions
X and Y are continuous
Relationship is linear
No extreme outliers
Both distributions are approximately normal
Use Cases
Linear regression
Feature selection
Detecting linear trends
Limitations
Cannot detect nonlinear relationships
Very sensitive to outliers
Partial Correlation#
Measures the correlation between X and Y after removing effect of one or more other variables (Z).
Assumptions
Same as Pearson (linear, continuous)
Confounding variable Z known
Use Cases
In multivariate regression
When features are correlated with each other (multicollinearity)
To find independent contribution of feature X
Limitations
Only captures linear controlled associations
Requires large sample size
RANK-BASED / MONOTONIC METHODS (Ordered or Nonlinear Monotonic)#
These methods work on ranks, not actual values.
They are robust to:
outliers
skewed distributions
non-linear but monotonic relations
Spearman Rank Correlation (ρ)#
Computes Pearson correlation on rank-transformed values.
Where (d_i) is rank difference.
Detects
Monotonic relationships (increasing/decreasing) even if non-linear.
Use Cases
Ordinal variables
Nonlinear increasing trends
Continuous variables with outliers
Advantages
Resistant to outliers
Works with ordinal data
Limitations
Cannot detect non-monotonic patterns (U-shape)
Kendall Tau (τ)#
Measures strength of monotonic relationship using concordant and discordant pairs.
Advantages
More robust for small samples
Handles ties better
Limitations
Computationally slower
Slightly less interpretable than Spearman
CATEGORICAL ASSOCIATION METHODS (Discrete ↔ Discrete)#
Used when both variables are categorical (binary or multi-class).
Chi-Square Test of Independence#
Tests whether two categorical variables are statistically associated.
Where O = observed, E = expected.
Use Cases
Contingency tables
Feature selection for classification
Detecting association
Limitations
Only tells if there is a relationship, not how strong
Requires large samples
Cramér’s V#
Measures strength of association between two categorical variables.
Range:
0 → no association
1 → perfect association
Advantages
Works for all table sizes (r × c)
Normalized metric
Phi Coefficient#
Equivalent to Pearson correlation for 2×2 tables (binary vs binary).
Use Cases
Binary gender vs binary buying decision
Fraud yes/no vs approval yes/no
Contingency Coefficient#
Legacy measure; less interpretable. Replaced by Cramér’s V.
D. NONLINEAR / INFORMATION-BASED ASSOCIATION METHODS#
Detect any type of relationship between variables—even irregular, complex shapes.
Mutual Information (MI)#
Measures how much information about Y is gained by knowing X.
Advantages
Detects all forms of dependency
Works with any combination of data types
Used heavily in ML (Random Forest, XGBoost feature selection)
Limitations
Has no fixed scale (not between -1 and 1)
Hard to interpret magnitude
Distance Correlation#
Correlation = 0 iff variables are independent (a stronger property than Pearson).
Advantages
Captures all nonlinear dependencies
Works for complex patterns
Limitations
More computationally expensive
MIC (Maximal Information Coefficient)#
Detects any “grid-like” pattern (linear, curved, exponential, periodic).
Advantages
Very powerful
Captures almost all kinds of relationships
Limitations
Slow on large data
Harder to interpret
GROUP-COMPARISON METHODS (Mixed Data Types)#
Used when one variable is continuous and the other is categorical.
Point-Biserial Correlation#
Special case of Pearson where:
X = continuous
Y = binary
Advantages
Simple and interpretable
Works for logistic-type separation
ANOVA (F-test)#
Compares mean of continuous variable across k ≥ 2 categories.
Use Cases
Continuous X vs Categorical Y (many classes)
Categorical X vs Continuous Y
Limitations
Assumes normality
Assumes equal variance
Kruskal–Wallis Test#
Nonparametric version of ANOVA.
Advantages
No assumptions
Works with skewed data
t-Test (two-sample t-test)#
Used when:
X continuous
Y binary
Tests
Whether mean difference between two groups is significant.
Limitations
Only works for binary Y
Mann–Whitney U Test#
Definition
Rank-based alternative to t-test.
Advantages
Works for non-normal distributions
Robust to outliers
ADVANCED / MODEL-BASED ASSOCIATION METHODS#
These methods examine predictive power, not pure statistical correlation.
Logistic Regression Coefficient (β)#
Shows how much change in X affects log-odds of Y.
Interpretation
β > 0 → X increases probability of Y=1
β < 0 → X reduces probability
ROC-AUC as Feature Association#
Treat continuous X as a classifier for binary Y.
Interprets
How well X separates classes.
Tree-Based Feature Importance#
Gini importance
Permutation importance
Advantages
Handles nonlinear, interaction effects
Works for any X/Y types
SHAP Values#
Explains how each value of X pushes prediction ↑ or ↓.
Advantages
Model-agnostic
Highly interpretable
X |
Y |
Best Methods |
|---|---|---|
Continuous |
Continuous |
Pearson, Spearman, Kendall, DistanceCorr, PartialCorr, MI, MIC |
Continuous |
Binary |
Point-Biserial, t-test, AUC, Logistic β, MI |
Continuous |
Ordinal |
Spearman, Kendall, MI |
Continuous |
Nominal |
ANOVA, Kruskal–Wallis, MI |
Continuous |
Discrete Numeric |
Pearson, Spearman, Kendall, MI |
Discrete |
Continuous |
t-test, ANOVA, Kruskal–Wallis, Spearman, MI |
Discrete |
Discrete |
Chi-square, Cramér’s V, Phi, MI |
Binary |
Binary |
Phi, Chi-square |
Ordinal |
Ordinal |
Spearman, Kendall |
Nominal |
Nominal |
Chi-square, Cramér’s V |