Ordinal X

Contents

Ordinal X #

Ordinal X → Continuous Y #

Ordinal X: HouseAge (binned into ordered categories: Low < Medium < High)
Continuous Y: MedInc

from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_regression
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Ordinal X via ordered bins
df["X_ord"] = pd.qcut(df["HouseAge"], q=4, labels=[1,2,3,4]).astype(int)
y = df["MedInc"]

# Spearman
spearman, _ = spearmanr(df["X_ord"], y)

# Kendall
kendall, _ = kendalltau(df["X_ord"], y)

# Mutual Information
mi = mutual_info_regression(df["X_ord"].values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)

Spearman: -0.15137493395360588
Kendall: -0.11330092656943518
Mutual Information: 0.010566575439892034

Metric	Value	What It Measures	Strength	Interpretation
Spearman	–0.1514	Monotonic (rank-based) association	Very weak	No meaningful monotonic trend; the slight negative direction is negligible and not practically relevant.
Kendall	–0.1133	Pairwise rank concordance	Very weak	Pairs show almost no directional agreement; relationship is nearly random.
Mutual Information	0.0106	Overall dependency (linear + nonlinear)	Extremely weak	Variables share almost no information; effectively independent.

Ordinal X → Binary Y #

Ordinal X: mean radius (binned into low/med/high → ordinal)
Binary Y: target (malignant/benign)

from sklearn.datasets import load_breast_cancer
from scipy.stats import spearmanr, kendalltau, pointbiserialr
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target  # binary

# Ordinal X
df["X_ord"] = pd.qcut(df["mean radius"], q=3, labels=[1,2,3]).astype(int)
X = df["X_ord"]
y = df["target"]

# Spearman
spearman, _ = spearmanr(X, y)

# Kendall
kendall, _ = kendalltau(X, y)

# Point-Biserial only valid if ordinal collapses to 2 levels
# → skip or collapse into binary
pb, _ = pointbiserialr((X==3).astype(int), y)

# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Point-Biserial (ordinal→binary):", pb)
print("Mutual Information:", mi)

Spearman: -0.7166570067314795
Kendall: -0.6756725363563483
Point-Biserial (ordinal→binary): -0.7415324310869174
Mutual Information: 0.31576208528774496

Metric	Value	What It Measures	Strength	Interpretation
Spearman	–0.7167	Monotonic (rank-based) association	Strong negative	As X increases, Y consistently decreases; strong downward monotonic trend.
Kendall	–0.6757	Pairwise concordance (rank agreement)	Strong negative	Most observation pairs move in opposite directions; strong directional disagreement.
Point-Biserial (ordinal→binary)	–0.7415	Linear association between ordinal/continuous X and binary Y	Strong negative	Higher X values strongly correspond to the negative class (0); strong separation.
Mutual Information	0.3158	Overall dependency (linear + nonlinear)	Moderate–Strong	X carries substantial predictive information about Y; meaningful dependency.

Ordinal X → Ordinal Y #

Ordinal X: MedInc binned
Ordinal Y: target price binned

from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target

# Ordinal X
df["X_ord"] = pd.qcut(df["MedInc"], q=4, labels=[1,2,3,4]).astype(int)

# Ordinal Y
df["Y_ord"] = pd.qcut(df["target"], q=4, labels=[1,2,3,4]).astype(int)

X = df["X_ord"]
y = df["Y_ord"]

# Spearman
spearman, _ = spearmanr(X, y)

# Kendall
kendall, _ = kendalltau(X, y)

# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)

Spearman: 0.6262544100103224
Kendall: 0.5439809988878718
Mutual Information: 0.24032098217597886

Metric	Value	What It Measures	Strength	Interpretation
Spearman	0.6263	Monotonic (rank-based) association	Strong positive	As X increases, Y consistently increases; clear and stable upward monotonic trend.
Kendall	0.5440	Pairwise rank concordance	Strong positive	Most observation pairs move together in the same direction; high directional agreement.
Mutual Information	0.2403	Overall dependency (linear + nonlinear)	Moderate	X carries meaningful predictive information about Y; noticeable shared dependency.

Ordinal X → Categorical Nominal Y #

Ordinal X: sepal length (binned low → high)
Nominal Y: species

from sklearn.datasets import load_iris
from scipy.stats import chi2_contingency
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target  # nominal categorical

# Ordinal X
df["X_ord"] = pd.qcut(df["sepal length (cm)"], q=3, labels=[1,2,3]).astype(int)

X = df["X_ord"]
y = df["species"]

# Chi-square test
table = pd.crosstab(X, y)
chi2, _, _, _ = chi2_contingency(table)

# Cramér’s V
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n * (k - 1)))

# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Chi-square:", chi2)
print("Cramér’s V:", cramers_v)
print("Mutual Information:", mi)

Chi-square: 123.28296703296704
Cramér’s V: 0.6410485343897321
Mutual Information: 0.46394039211315063

Metric	Value	What It Measures	Strength / Meaning	Interpretation
Chi-square	123.283	Statistical test of independence for categorical variables	Very large → highly significant	Observed category frequencies differ strongly from expected; clear evidence of dependency.
Cramér’s V	0.6410	Effect size for association between categorical variables	Strong association	Strong relationship; categories of one variable strongly predict categories of the other.
Mutual Information	0.4639	Information shared between the two categorical variables (linear + nonlinear)	Strong	One variable carries substantial predictive information about the other; high shared dependency.

Ordinal X → Discrete Numeric Y #

Ordinal X: AveRooms → bins
Discrete numeric Y: Population

from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_regression
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Ordinal X
df["X_ord"] = pd.qcut(df["AveRooms"], q=4, labels=[1,2,3,4]).astype(int)

# Discrete numeric Y
y = df["Population"]

# Spearman
spearman, _ = spearmanr(df["X_ord"], y)

# Kendall
kendall, _ = kendalltau(df["X_ord"], y)

# Mutual Information
mi = mutual_info_regression(df["X_ord"].values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)

Spearman: -0.10083503138181427
Kendall: -0.07675727164739182
Mutual Information: 0.012877429553626474

Metric	Value	What It Measures	Strength	Interpretation
Spearman	–0.1008	Monotonic (rank-based) association	Very weak	No meaningful monotonic trend; the slight negative sign is negligible and not practically useful.
Kendall	–0.0768	Pairwise rank concordance	Very weak	Very little directional agreement; ranked values behave almost randomly.
Mutual Information	0.0129	Overall dependency (linear + nonlinear)	Extremely weak	Variables share almost no information; effectively independent.