Continous X

Contents

Continous X #

Continuous X → Continuous Y #

# X = BMI
# Y = Disease progression (continuous)
from sklearn.datasets import load_diabetes
from scipy.stats import pearsonr, spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_regression
import numpy as np

data = load_diabetes()
X = data.data[:, 2]        # BMI feature
y = data.target            # continuous target

pearson, _ = pearsonr(X, y)
spearman, _ = spearmanr(X, y)
kendall, _ = kendalltau(X, y)
mi = mutual_info_regression(X.reshape(-1,1), y)[0]

print("Pearson:", pearson)
print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)

Pearson: 0.5864501344746886
Spearman: 0.5613820101065616
Kendall: 0.39119525733058874
Mutual Information: 0.17815067428832032

Metric	Type	Strength	Interpretation
Pearson = 0.586	Linear	Moderate–Strong	Clear upward linear trend
Spearman = 0.561	Monotonic	Moderate–Strong	BMI ↑ → Disease ↑ consistently
Kendall = 0.391	Rank	Moderate	Good concordance of ordering
MI = 0.178	Nonlinear	Meaningful	BMI carries predictive signal

Continuous X → Binary Y #

X = mean radius Y = diagnosis (0/1)

from sklearn.datasets import load_breast_cancer
from scipy.stats import pointbiserialr
from sklearn.metrics import roc_auc_score
from sklearn.feature_selection import mutual_info_classif
from sklearn.linear_model import LogisticRegression

data = load_breast_cancer()
X = data.data[:, 0]      # mean radius
y = data.target          # binary

pb, _ = pointbiserialr(y, X)

model = LogisticRegression()
model.fit(X.reshape(-1,1), y)
beta = model.coef_[0][0]

auc = roc_auc_score(y, X)
mi = mutual_info_classif(X.reshape(-1,1), y)[0]

print("Point-Biserial:", pb)
print("Logistic β:", beta)
print("AUC:", auc)
print("Mutual Information:", mi)

Point-Biserial: -0.7300285113754563
Logistic β: -1.0251962293185464
AUC: 0.0624834839596216
Mutual Information: 0.3690285464383032

Metric	What it Means
Point–Biserial: –0.73	Very strong negative association
Logistic β: –1.025	Higher X → sharply lower P(Y=1)
AUC: 0.062	Model predicts perfectly in the reverse direction
MI: 0.369	High information content

Continuous X → Ordinal Y #

from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
import numpy as np

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target

X = df["MedInc"]  # continuous
# Create ordinal bins of the target
y = pd.qcut(df["target"], q=3, labels=[1,2,3])  # ordinal 1<2<3
y = y.astype(int)

spearman, _ = spearmanr(X, y)
kendall, _ = kendalltau(X, y)
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)

Spearman: 0.6303437101989041
Kendall: 0.5068776790681007
Mutual Information: 0.2659892673280484

Metric	Value	What It Measures	Strength	Interpretation
Spearman	0.6303	Monotonic (rank-based) correlation	Strong positive	As X increases, Y increases consistently in rank; strong ordered trend
Kendall	0.5069	Concordance of ranked pairs	Strong positive	Most observation pairs are concordant; X and Y move together directionally
Mutual Information	0.2660	General dependency (linear + nonlinear)	Moderate–Strong	X contains meaningful predictive information about Y; noticeable shared dependency

Continuous X → Nominal (Categorical) Y #

#X = Petal length
#Y = Species (Setosa / Versicolor / Virginica)
from sklearn.datasets import load_iris
from scipy.stats import f_oneway, kruskal
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target

X = df["petal length (cm)"]
y = df["species"]

groups = [X[y == c] for c in y.unique()]

anova_f, _ = f_oneway(*groups)
kw, _ = kruskal(*groups)
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("ANOVA F:", anova_f)
print("Kruskal–Wallis:", kw)
print("Mutual Information:", mi)

ANOVA F: 1180.161182252981
Kruskal–Wallis: 130.41104857977163
Mutual Information: 1.0063262962592772

Metric	Value	What It Measures	Strength	Interpretation
Spearman	0.6303	Monotonic (rank-based) association	Strong positive	X and Y increase together in a consistent ranked order; strong monotonic trend.
Kendall	0.5069	Pairwise concordance (rank agreement)	Strong positive	Most observation pairs move in the same direction; high directional agreement.
Mutual Information	0.2660	Overall dependency (linear + nonlinear)	Moderate–Strong	X carries substantial information about Y; clear shared dependency.

Continuous X → Discrete Numeric Y #

X = AveRooms (continuous) Y = Population (discrete numeric integer)

from sklearn.datasets import fetch_california_housing
from scipy.stats import pearsonr, spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_regression
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

X = df["AveRooms"]
y = df["Population"]

pearson, _ = pearsonr(X, y)
spearman, _ = spearmanr(X, y)
kendall, _ = kendalltau(X, y)
mi = mutual_info_regression(X.values.reshape(-1,1), y)[0]

print("Pearson:", pearson)
print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)

Pearson: -0.0722128486589335
Spearman: -0.10538515380075536
Kendall: -0.07251597080592617
Mutual Information: 0.034846658720868895

Metric	Value	What It Measures	Strength	Interpretation
Pearson	–0.0722	Linear correlation	Very weak	No linear relationship; the small negative value is negligible and not meaningful.
Spearman	–0.1054	Monotonic (rank-based) association	Very weak	No consistent increasing or decreasing trend; ordering is mostly random.
Kendall	–0.0725	Pairwise concordance (rank agreement)	Very weak	No directional agreement; pairs behave nearly randomly.
Mutual Information	0.0348	Overall dependency (linear + nonlinear)	Extremely weak	Variables share almost no information; effectively independent.