Ordinal X#
Ordinal X → Continuous Y#
Ordinal X: HouseAge (binned into ordered categories: Low < Medium < High)
Continuous Y: MedInc
from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_regression
import pandas as pd
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Ordinal X via ordered bins
df["X_ord"] = pd.qcut(df["HouseAge"], q=4, labels=[1,2,3,4]).astype(int)
y = df["MedInc"]
# Spearman
spearman, _ = spearmanr(df["X_ord"], y)
# Kendall
kendall, _ = kendalltau(df["X_ord"], y)
# Mutual Information
mi = mutual_info_regression(df["X_ord"].values.reshape(-1,1), y)[0]
print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)
Spearman: -0.15137493395360588
Kendall: -0.11330092656943518
Mutual Information: 0.010566575439892034
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Spearman |
–0.1514 |
Monotonic (rank-based) association |
Very weak |
No meaningful monotonic trend; the slight negative direction is negligible and not practically relevant. |
Kendall |
–0.1133 |
Pairwise rank concordance |
Very weak |
Pairs show almost no directional agreement; relationship is nearly random. |
Mutual Information |
0.0106 |
Overall dependency (linear + nonlinear) |
Extremely weak |
Variables share almost no information; effectively independent. |
Ordinal X → Binary Y#
Ordinal X: mean radius (binned into low/med/high → ordinal)
Binary Y: target (malignant/benign)
from sklearn.datasets import load_breast_cancer
from scipy.stats import spearmanr, kendalltau, pointbiserialr
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target # binary
# Ordinal X
df["X_ord"] = pd.qcut(df["mean radius"], q=3, labels=[1,2,3]).astype(int)
X = df["X_ord"]
y = df["target"]
# Spearman
spearman, _ = spearmanr(X, y)
# Kendall
kendall, _ = kendalltau(X, y)
# Point-Biserial only valid if ordinal collapses to 2 levels
# → skip or collapse into binary
pb, _ = pointbiserialr((X==3).astype(int), y)
# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]
print("Spearman:", spearman)
print("Kendall:", kendall)
print("Point-Biserial (ordinal→binary):", pb)
print("Mutual Information:", mi)
Spearman: -0.7166570067314795
Kendall: -0.6756725363563483
Point-Biserial (ordinal→binary): -0.7415324310869174
Mutual Information: 0.31576208528774496
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Spearman |
–0.7167 |
Monotonic (rank-based) association |
Strong negative |
As X increases, Y consistently decreases; strong downward monotonic trend. |
Kendall |
–0.6757 |
Pairwise concordance (rank agreement) |
Strong negative |
Most observation pairs move in opposite directions; strong directional disagreement. |
Point-Biserial (ordinal→binary) |
–0.7415 |
Linear association between ordinal/continuous X and binary Y |
Strong negative |
Higher X values strongly correspond to the negative class (0); strong separation. |
Mutual Information |
0.3158 |
Overall dependency (linear + nonlinear) |
Moderate–Strong |
X carries substantial predictive information about Y; meaningful dependency. |
Ordinal X → Ordinal Y#
Ordinal X: MedInc binned
Ordinal Y: target price binned
from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target
# Ordinal X
df["X_ord"] = pd.qcut(df["MedInc"], q=4, labels=[1,2,3,4]).astype(int)
# Ordinal Y
df["Y_ord"] = pd.qcut(df["target"], q=4, labels=[1,2,3,4]).astype(int)
X = df["X_ord"]
y = df["Y_ord"]
# Spearman
spearman, _ = spearmanr(X, y)
# Kendall
kendall, _ = kendalltau(X, y)
# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]
print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)
Spearman: 0.6262544100103224
Kendall: 0.5439809988878718
Mutual Information: 0.24032098217597886
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Spearman |
0.6263 |
Monotonic (rank-based) association |
Strong positive |
As X increases, Y consistently increases; clear and stable upward monotonic trend. |
Kendall |
0.5440 |
Pairwise rank concordance |
Strong positive |
Most observation pairs move together in the same direction; high directional agreement. |
Mutual Information |
0.2403 |
Overall dependency (linear + nonlinear) |
Moderate |
X carries meaningful predictive information about Y; noticeable shared dependency. |
Ordinal X → Categorical Nominal Y#
Ordinal X: sepal length (binned low → high)
Nominal Y: species
from sklearn.datasets import load_iris
from scipy.stats import chi2_contingency
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target # nominal categorical
# Ordinal X
df["X_ord"] = pd.qcut(df["sepal length (cm)"], q=3, labels=[1,2,3]).astype(int)
X = df["X_ord"]
y = df["species"]
# Chi-square test
table = pd.crosstab(X, y)
chi2, _, _, _ = chi2_contingency(table)
# Cramér’s V
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n * (k - 1)))
# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]
print("Chi-square:", chi2)
print("Cramér’s V:", cramers_v)
print("Mutual Information:", mi)
Chi-square: 123.28296703296704
Cramér’s V: 0.6410485343897321
Mutual Information: 0.46394039211315063
Metric |
Value |
What It Measures |
Strength / Meaning |
Interpretation |
|---|---|---|---|---|
Chi-square |
123.283 |
Statistical test of independence for categorical variables |
Very large → highly significant |
Observed category frequencies differ strongly from expected; clear evidence of dependency. |
Cramér’s V |
0.6410 |
Effect size for association between categorical variables |
Strong association |
Strong relationship; categories of one variable strongly predict categories of the other. |
Mutual Information |
0.4639 |
Information shared between the two categorical variables (linear + nonlinear) |
Strong |
One variable carries substantial predictive information about the other; high shared dependency. |
Ordinal X → Discrete Numeric Y#
Ordinal X: AveRooms → bins
Discrete numeric Y: Population
from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_regression
import pandas as pd
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Ordinal X
df["X_ord"] = pd.qcut(df["AveRooms"], q=4, labels=[1,2,3,4]).astype(int)
# Discrete numeric Y
y = df["Population"]
# Spearman
spearman, _ = spearmanr(df["X_ord"], y)
# Kendall
kendall, _ = kendalltau(df["X_ord"], y)
# Mutual Information
mi = mutual_info_regression(df["X_ord"].values.reshape(-1,1), y)[0]
print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)
Spearman: -0.10083503138181427
Kendall: -0.07675727164739182
Mutual Information: 0.012877429553626474
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Spearman |
–0.1008 |
Monotonic (rank-based) association |
Very weak |
No meaningful monotonic trend; the slight negative sign is negligible and not practically useful. |
Kendall |
–0.0768 |
Pairwise rank concordance |
Very weak |
Very little directional agreement; ranked values behave almost randomly. |
Mutual Information |
0.0129 |
Overall dependency (linear + nonlinear) |
Extremely weak |
Variables share almost no information; effectively independent. |