Ordinal X#

Ordinal X → Continuous Y#

  • Ordinal X: HouseAge (binned into ordered categories: Low < Medium < High)

  • Continuous Y: MedInc

from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_regression
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Ordinal X via ordered bins
df["X_ord"] = pd.qcut(df["HouseAge"], q=4, labels=[1,2,3,4]).astype(int)
y = df["MedInc"]

# Spearman
spearman, _ = spearmanr(df["X_ord"], y)

# Kendall
kendall, _ = kendalltau(df["X_ord"], y)

# Mutual Information
mi = mutual_info_regression(df["X_ord"].values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)
Spearman: -0.15137493395360588
Kendall: -0.11330092656943518
Mutual Information: 0.010566575439892034

Metric

Value

What It Measures

Strength

Interpretation

Spearman

–0.1514

Monotonic (rank-based) association

Very weak

No meaningful monotonic trend; the slight negative direction is negligible and not practically relevant.

Kendall

–0.1133

Pairwise rank concordance

Very weak

Pairs show almost no directional agreement; relationship is nearly random.

Mutual Information

0.0106

Overall dependency (linear + nonlinear)

Extremely weak

Variables share almost no information; effectively independent.

Ordinal X → Binary Y#

  • Ordinal X: mean radius (binned into low/med/high → ordinal)

  • Binary Y: target (malignant/benign)

from sklearn.datasets import load_breast_cancer
from scipy.stats import spearmanr, kendalltau, pointbiserialr
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target  # binary

# Ordinal X
df["X_ord"] = pd.qcut(df["mean radius"], q=3, labels=[1,2,3]).astype(int)
X = df["X_ord"]
y = df["target"]

# Spearman
spearman, _ = spearmanr(X, y)

# Kendall
kendall, _ = kendalltau(X, y)

# Point-Biserial only valid if ordinal collapses to 2 levels
# → skip or collapse into binary
pb, _ = pointbiserialr((X==3).astype(int), y)

# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Point-Biserial (ordinal→binary):", pb)
print("Mutual Information:", mi)
Spearman: -0.7166570067314795
Kendall: -0.6756725363563483
Point-Biserial (ordinal→binary): -0.7415324310869174
Mutual Information: 0.31576208528774496

Metric

Value

What It Measures

Strength

Interpretation

Spearman

–0.7167

Monotonic (rank-based) association

Strong negative

As X increases, Y consistently decreases; strong downward monotonic trend.

Kendall

–0.6757

Pairwise concordance (rank agreement)

Strong negative

Most observation pairs move in opposite directions; strong directional disagreement.

Point-Biserial (ordinal→binary)

–0.7415

Linear association between ordinal/continuous X and binary Y

Strong negative

Higher X values strongly correspond to the negative class (0); strong separation.

Mutual Information

0.3158

Overall dependency (linear + nonlinear)

Moderate–Strong

X carries substantial predictive information about Y; meaningful dependency.

Ordinal X → Ordinal Y#

  • Ordinal X: MedInc binned

  • Ordinal Y: target price binned

from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target

# Ordinal X
df["X_ord"] = pd.qcut(df["MedInc"], q=4, labels=[1,2,3,4]).astype(int)

# Ordinal Y
df["Y_ord"] = pd.qcut(df["target"], q=4, labels=[1,2,3,4]).astype(int)

X = df["X_ord"]
y = df["Y_ord"]

# Spearman
spearman, _ = spearmanr(X, y)

# Kendall
kendall, _ = kendalltau(X, y)

# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)
Spearman: 0.6262544100103224
Kendall: 0.5439809988878718
Mutual Information: 0.24032098217597886

Metric

Value

What It Measures

Strength

Interpretation

Spearman

0.6263

Monotonic (rank-based) association

Strong positive

As X increases, Y consistently increases; clear and stable upward monotonic trend.

Kendall

0.5440

Pairwise rank concordance

Strong positive

Most observation pairs move together in the same direction; high directional agreement.

Mutual Information

0.2403

Overall dependency (linear + nonlinear)

Moderate

X carries meaningful predictive information about Y; noticeable shared dependency.

Ordinal X → Categorical Nominal Y#

  • Ordinal X: sepal length (binned low → high)

  • Nominal Y: species

from sklearn.datasets import load_iris
from scipy.stats import chi2_contingency
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target  # nominal categorical

# Ordinal X
df["X_ord"] = pd.qcut(df["sepal length (cm)"], q=3, labels=[1,2,3]).astype(int)

X = df["X_ord"]
y = df["species"]

# Chi-square test
table = pd.crosstab(X, y)
chi2, _, _, _ = chi2_contingency(table)

# Cramér’s V
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n * (k - 1)))

# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Chi-square:", chi2)
print("Cramér’s V:", cramers_v)
print("Mutual Information:", mi)
Chi-square: 123.28296703296704
Cramér’s V: 0.6410485343897321
Mutual Information: 0.46394039211315063

Metric

Value

What It Measures

Strength / Meaning

Interpretation

Chi-square

123.283

Statistical test of independence for categorical variables

Very large → highly significant

Observed category frequencies differ strongly from expected; clear evidence of dependency.

Cramér’s V

0.6410

Effect size for association between categorical variables

Strong association

Strong relationship; categories of one variable strongly predict categories of the other.

Mutual Information

0.4639

Information shared between the two categorical variables (linear + nonlinear)

Strong

One variable carries substantial predictive information about the other; high shared dependency.

Ordinal X → Discrete Numeric Y#

  • Ordinal X: AveRooms → bins

  • Discrete numeric Y: Population

from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_regression
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Ordinal X
df["X_ord"] = pd.qcut(df["AveRooms"], q=4, labels=[1,2,3,4]).astype(int)

# Discrete numeric Y
y = df["Population"]

# Spearman
spearman, _ = spearmanr(df["X_ord"], y)

# Kendall
kendall, _ = kendalltau(df["X_ord"], y)

# Mutual Information
mi = mutual_info_regression(df["X_ord"].values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)
Spearman: -0.10083503138181427
Kendall: -0.07675727164739182
Mutual Information: 0.012877429553626474

Metric

Value

What It Measures

Strength

Interpretation

Spearman

–0.1008

Monotonic (rank-based) association

Very weak

No meaningful monotonic trend; the slight negative sign is negligible and not practically useful.

Kendall

–0.0768

Pairwise rank concordance

Very weak

Very little directional agreement; ranked values behave almost randomly.

Mutual Information

0.0129

Overall dependency (linear + nonlinear)

Extremely weak

Variables share almost no information; effectively independent.