Correlation Analysis#


Continuous X vs Y#

X (Feature)

Y (Target)

Best Methods

Continuous

Continuous

Pearson, Spearman, Kendall, Distance Correlation, Mutual Information, Partial Correlation, MIC

Continuous

Binary

Point-Biserial, Logistic Coefficient, AUC, Mutual Information

Continuous

Ordinal

Spearman, Kendall, Mutual Information

Continuous

Categorical (Nominal)

ANOVA F-test, Kruskal–Wallis, Mutual Information

Continuous

Discrete Numeric

Pearson, Spearman, Kendall, MI


Discrete Numeric X vs Y#

X

Y

Best Methods

Discrete Numeric

Continuous

t-test (if 2 groups), ANOVA (k groups), Kruskal–Wallis, Point-Biserial (if X is effectively binary), Spearman (if numeric rank-like), MI

Discrete Numeric

Binary

Point-Biserial (if X binary), Spearman (if ordinal meaning), t-test, Mann–Whitney, Mutual Information

Discrete Numeric

Ordinal

Spearman, Kendall, Mutual Information

Discrete Numeric

Categorical Nominal

ANOVA, Kruskal–Wallis, Mutual Information

Discrete Numeric

Discrete Numeric

Spearman, Kendall, Chi-square (if frequencies), Mutual Information


Binary X vs Y#

X

Y

Best Methods

Binary

Continuous

Point-Biserial, t-test, Mann–Whitney, Mutual Information

Binary

Binary

Phi coefficient, Chi-square, Mutual Information

Binary

Ordinal

Spearman, Kendall, Mutual Information

Binary

Categorical Nominal

Chi-square, Cramér’s V, Mutual Information

Binary

Discrete Numeric

Spearman, Point-Biserial, t-test, MI


Ordinal X vs Y#

X

Y

Best Methods

Ordinal

Continuous

Spearman, Kendall, Mutual Information

Ordinal

Binary

Point-Biserial (if ordinal reduces to 2), Spearman, Kendall, MI

Ordinal

Ordinal

Spearman, Kendall, Mutual Information

Ordinal

Categorical Nominal

Chi-square, Cramér’s V, Mutual Information

Ordinal

Discrete Numeric

Spearman, Kendall, MI


Categorical (Nominal) X vs Y#

X

Y

Best Methods

Categorical

Continuous

ANOVA, Kruskal–Wallis, Eta-squared, Mutual Information

Categorical

Binary

Chi-square, Cramér’s V, Mutual Information

Categorical

Ordinal

Chi-square, Gamma, Cramér’s V, Kendall, MI

Categorical

Categorical

Chi-square, Cramér’s V, Phi (2×2), Mutual Information

Categorical

Discrete Numeric

ANOVA, Kruskal–Wallis, MI


Summary

X Type

Y Type

Recommended Techniques

Continuous → Continuous

Pearson, Spearman, Kendall, Distance Corr, Partial Corr, MI

Continuous → Discrete Ordinal

Spearman, Kendall, MI

Continuous → Discrete Categorical

ANOVA, Kruskal–Wallis, MI

Continuous → Binary

Point-Biserial, AUC, Logistic β, MI

Discrete → Continuous

t-test, ANOVA, Kruskal–Wallis, Spearman, MI

Discrete → Discrete

Chi-square, Cramér’s V, Phi, MI

Ordinal ↔ Ordinal

Spearman, Kendall, MI

Binary ↔ Binary

Phi, Chi-square, MI

Nominal ↔ Nominal

Chi-square, Cramér’s V, MI

LINEAR CORRELATION METHODS (Continuous ↔ Continuous)#

These methods evaluate linear dependence, meaning how closely data points fit a straight line.


Pearson Correlation (Linear Correlation)#

Definition

Measures the strength and direction of linear relationship between two continuous variables X and Y.

\[ r = \frac{\sum (x - \bar{x})(y - \bar{y})} {\sqrt{\sum (x - \bar{x})^2 \sum (y - \bar{y})^2}} \]

Value ranges from:

  • +1 → perfect positive linear relation

  • −1 → perfect negative linear relation

  • 0 → no linear relation

Assumptions

  • X and Y are continuous

  • Relationship is linear

  • No extreme outliers

  • Both distributions are approximately normal

Use Cases

  • Linear regression

  • Feature selection

  • Detecting linear trends

Limitations

  • Cannot detect nonlinear relationships

  • Very sensitive to outliers


Partial Correlation#

Measures the correlation between X and Y after removing effect of one or more other variables (Z).

\[ r_{xy\cdot z}= \frac{r_{xy}-r_{xz}r_{yz}}{\sqrt{(1-r_{xz}^2)(1-r_{yz}^2)}} \]

Assumptions

  • Same as Pearson (linear, continuous)

  • Confounding variable Z known

Use Cases

  • In multivariate regression

  • When features are correlated with each other (multicollinearity)

  • To find independent contribution of feature X

Limitations

  • Only captures linear controlled associations

  • Requires large sample size


RANK-BASED / MONOTONIC METHODS (Ordered or Nonlinear Monotonic)#

These methods work on ranks, not actual values.

They are robust to:

  • outliers

  • skewed distributions

  • non-linear but monotonic relations


Spearman Rank Correlation (ρ)#

Computes Pearson correlation on rank-transformed values.

\[ \rho = 1- \frac{6\sum d_i^2}{n(n^2-1)} \]

Where (d_i) is rank difference.

Detects

  • Monotonic relationships (increasing/decreasing) even if non-linear.

Use Cases

  • Ordinal variables

  • Nonlinear increasing trends

  • Continuous variables with outliers

Advantages

  • Resistant to outliers

  • Works with ordinal data

Limitations

  • Cannot detect non-monotonic patterns (U-shape)


Kendall Tau (τ)#

Measures strength of monotonic relationship using concordant and discordant pairs.

\[ \tau = \frac{C - D}{\frac{1}{2}n(n-1)} \]
  • Advantages

  • More robust for small samples

  • Handles ties better

  • Limitations

  • Computationally slower

  • Slightly less interpretable than Spearman


CATEGORICAL ASSOCIATION METHODS (Discrete ↔ Discrete)#

Used when both variables are categorical (binary or multi-class).


Chi-Square Test of Independence#

Tests whether two categorical variables are statistically associated.

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

Where O = observed, E = expected.

Use Cases

  • Contingency tables

  • Feature selection for classification

  • Detecting association

Limitations

  • Only tells if there is a relationship, not how strong

  • Requires large samples


Cramér’s V#

Measures strength of association between two categorical variables.

\[ V = \sqrt{\frac{\chi^2}{n(k-1)}} \]

Range:

  • 0 → no association

  • 1 → perfect association

  • Advantages

  • Works for all table sizes (r × c)

  • Normalized metric


Phi Coefficient#

Equivalent to Pearson correlation for 2×2 tables (binary vs binary).

\[ \phi = \sqrt{\frac{\chi^2}{n}} \]

Use Cases

  • Binary gender vs binary buying decision

  • Fraud yes/no vs approval yes/no


Contingency Coefficient#

Legacy measure; less interpretable. Replaced by Cramér’s V.


D. NONLINEAR / INFORMATION-BASED ASSOCIATION METHODS#

Detect any type of relationship between variables—even irregular, complex shapes.


Mutual Information (MI)#

Measures how much information about Y is gained by knowing X.

\[ I(X;Y)=\sum p(x,y)\log \frac{p(x,y)}{p(x)p(y)} \]

Advantages

  • Detects all forms of dependency

  • Works with any combination of data types

  • Used heavily in ML (Random Forest, XGBoost feature selection)

Limitations

  • Has no fixed scale (not between -1 and 1)

  • Hard to interpret magnitude


Distance Correlation#

Correlation = 0 iff variables are independent (a stronger property than Pearson).

Advantages

  • Captures all nonlinear dependencies

  • Works for complex patterns

Limitations

  • More computationally expensive


MIC (Maximal Information Coefficient)#

Detects any “grid-like” pattern (linear, curved, exponential, periodic).

Advantages

  • Very powerful

  • Captures almost all kinds of relationships

Limitations

  • Slow on large data

  • Harder to interpret


GROUP-COMPARISON METHODS (Mixed Data Types)#

Used when one variable is continuous and the other is categorical.


Point-Biserial Correlation#

Special case of Pearson where:

  • X = continuous

  • Y = binary

\[ r_{pb}= \frac{\bar{x}_1 - \bar{x}_0}{s_x}\sqrt{\frac{n_1 n_0}{n^2}} \]

Advantages

  • Simple and interpretable

  • Works for logistic-type separation


ANOVA (F-test)#

Compares mean of continuous variable across k ≥ 2 categories.

\[ F = \frac{\text{Between-group variance}}{\text{Within-group variance}} \]

Use Cases

  • Continuous X vs Categorical Y (many classes)

  • Categorical X vs Continuous Y

Limitations

  • Assumes normality

  • Assumes equal variance


Kruskal–Wallis Test#

Nonparametric version of ANOVA.

Advantages

  • No assumptions

  • Works with skewed data


t-Test (two-sample t-test)#

Used when:

  • X continuous

  • Y binary

Tests

Whether mean difference between two groups is significant.

Limitations

  • Only works for binary Y


Mann–Whitney U Test#

Definition

Rank-based alternative to t-test.

Advantages

  • Works for non-normal distributions

  • Robust to outliers


ADVANCED / MODEL-BASED ASSOCIATION METHODS#

These methods examine predictive power, not pure statistical correlation.


Logistic Regression Coefficient (β)#

Shows how much change in X affects log-odds of Y.

Interpretation

  • β > 0 → X increases probability of Y=1

  • β < 0 → X reduces probability


ROC-AUC as Feature Association#

Treat continuous X as a classifier for binary Y.

\[ AUC = P(X_{positive} > X_{negative}) \]

Interprets

How well X separates classes.


Tree-Based Feature Importance#

  • Gini importance

  • Permutation importance

Advantages

  • Handles nonlinear, interaction effects

  • Works for any X/Y types


SHAP Values#

Explains how each value of X pushes prediction ↑ or ↓.

Advantages

  • Model-agnostic

  • Highly interpretable


X

Y

Best Methods

Continuous

Continuous

Pearson, Spearman, Kendall, DistanceCorr, PartialCorr, MI, MIC

Continuous

Binary

Point-Biserial, t-test, AUC, Logistic β, MI

Continuous

Ordinal

Spearman, Kendall, MI

Continuous

Nominal

ANOVA, Kruskal–Wallis, MI

Continuous

Discrete Numeric

Pearson, Spearman, Kendall, MI

Discrete

Continuous

t-test, ANOVA, Kruskal–Wallis, Spearman, MI

Discrete

Discrete

Chi-square, Cramér’s V, Phi, MI

Binary

Binary

Phi, Chi-square

Ordinal

Ordinal

Spearman, Kendall

Nominal

Nominal

Chi-square, Cramér’s V

Click here for Sections