Binary X#
Binary X → Continuous Y#
Binary X: HouseAge > 30
Continuous Y: Median Income (MedInc)
from sklearn.datasets import fetch_california_housing
from scipy.stats import pointbiserialr, ttest_ind, mannwhitneyu
from sklearn.feature_selection import mutual_info_regression
import pandas as pd
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["X_bin"] = (df["HouseAge"] > 30).astype(int)
y = df["MedInc"]
X = df["X_bin"]
# Point-Biserial
pb, _ = pointbiserialr(X, y)
# t-test
t_stat, _ = ttest_ind(y[X==0], y[X==1])
# Mann-Whitney U (nonparametric)
u_stat, _ = mannwhitneyu(y[X==0], y[X==1])
# Mutual Information
mi = mutual_info_regression(X.values.reshape(-1,1), y)[0]
print("Point-Biserial:", pb)
print("t-test:", t_stat)
print("Mann–Whitney:", u_stat)
print("Mutual Information:", mi)
Point-Biserial: -0.08891425114393547
t-test: 12.824153614760043
Mann–Whitney: 59254034.5
Mutual Information: 0.009422289272053463
Metric |
Value |
What It Measures |
Strength / Meaning |
Interpretation |
|---|---|---|---|---|
Point–Biserial |
–0.0889 |
Linear association between a continuous variable and a binary variable |
Very weak negative |
The two groups have almost the same mean; X only very slightly lower for class 1. |
t-test (t-statistic) |
12.824 |
Difference in group means assuming normality |
Statistically significant (given typical sample sizes) |
Means differ statistically, but effect size is small (since r ≈ –0.09). Large t likely due to large sample size, not strong effect. |
Mann–Whitney U |
59,254,034.5 |
Difference in distributions (non-parametric) |
Significance depends on group sizes |
Indicates distributional difference, but does not quantify strength; with large samples even tiny effects give large U. |
Mutual Information |
0.00942 |
General dependency (linear + nonlinear) |
Extremely weak |
Variables share almost no information; relationship is effectively negligible. |
Binary X → Binary Y#
Binary X: mean radius > median
Binary Y: cancer class (malignant / benign)
from sklearn.datasets import load_breast_cancer
from scipy.stats import chi2_contingency
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target # binary
df["X_bin"] = (df["mean radius"] > df["mean radius"].median()).astype(int)
X = df["X_bin"]
y = df["target"]
# Phi coefficient
table = pd.crosstab(X, y)
chi2, _, _, _ = chi2_contingency(table)
phi = sqrt(chi2 / len(df))
# Chi-square
chi2_value, p, _, _ = chi2_contingency(table)
# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]
print("Phi Coefficient:", phi)
print("Chi-Square:", chi2_value)
print("Mutual Information:", mi)
Phi Coefficient: 0.644740755602814
Chi-Square: 236.52797526117857
Mutual Information: 0.22088305825323062
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Phi Coefficient |
0.6447 |
Association between two binary variables |
Strong positive |
Strong relationship; when one binary variable is 1, the other is likely also 1 (or 0 with 0). Consistent directional association. |
Chi-Square |
236.528 |
Test of independence for categorical variables |
Very large → statistically significant |
Observed frequencies differ sharply from expected frequencies. Strong evidence of dependency between categories. |
Mutual Information |
0.2209 |
Information shared between the two variables (linear + nonlinear) |
Moderate |
The variables share a meaningful amount of information; knowing one reduces uncertainty about the other. |
Binary X → Ordinal Y#
Binary X: Median Income > 3
Ordinal Y: target (price) converted to 3-level ordinal bins: Low, Medium, High
from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target
df["X_bin"] = (df["MedInc"] > 3).astype(int)
# Ordinal Y
df["Y_ord"] = pd.qcut(df["target"], q=3, labels=[1,2,3]).astype(int)
X = df["X_bin"]
y = df["Y_ord"]
# Spearman
spearman, _ = spearmanr(X, y)
# Kendall
kendall, _ = kendalltau(X, y)
# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]
print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)
Spearman: 0.49811001629852997
Kendall: 0.46962263368309787
Mutual Information: 0.13921068369562795
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Spearman |
0.4981 |
Monotonic (rank-based) correlation |
Moderate–Strong |
As X increases, Y tends to increase in a consistent ranked pattern; clear upward trend. |
Kendall |
0.4696 |
Pairwise concordance (rank agreement) |
Strong |
Most observation pairs move in the same direction; high directional agreement between X and Y. |
Mutual Information |
0.1392 |
Overall dependency (linear + nonlinear) |
Moderate |
X carries meaningful predictive information about Y; noticeable shared dependency. |
Binary X → Categorical Nominal Y#
Binary X: petal length > 2.5
Nominal Y: species (3 classes)
from sklearn.datasets import load_iris
from scipy.stats import chi2_contingency
import pandas as pd
from sklearn.feature_selection import mutual_info_classif
from math import sqrt
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target
df["X_bin"] = (df["petal length (cm)"] > 2.5).astype(int)
X = df["X_bin"]
y = df["species"]
# Cramér's V
table = pd.crosstab(X, y)
chi2, _, _, _ = chi2_contingency(table)
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n * (k-1)))
# Chi-square
chi2_value, p, _, _ = chi2_contingency(table)
# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]
print("Cramér’s V:", cramers_v)
print("Chi-Square:", chi2_value)
print("Mutual Information:", mi)
Cramér’s V: 1.0
Chi-Square: 150.0
Mutual Information: 0.7281284489676516
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Cramér’s V |
1.0 |
Strength of association between two categorical variables |
Perfect association |
The categories align perfectly; one variable fully determines the other. |
Chi-Square |
150.0 |
Test of independence (categorical–categorical) |
Very large → statistically significant |
Observed frequencies differ sharply from expected; confirms strong dependence. |
Mutual Information |
0.7281 |
Amount of shared information (0 = none, 1 = high) |
High |
Variables share substantial information; knowing one reduces uncertainty about the other. |
Binary X → Discrete Numeric Y#
Binary X: Bedrooms > median
Discrete numeric Y: Population (integer)
from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, pointbiserialr, ttest_ind
from sklearn.feature_selection import mutual_info_regression
import pandas as pd
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["X_bin"] = (df["AveBedrms"] > df["AveBedrms"].median()).astype(int)
y = df["Population"]
X = df["X_bin"]
# Spearman
spearman, _ = spearmanr(X, y)
# Point-Biserial
pb, _ = pointbiserialr(X, y)
# t-test (due to binary X)
t_stat, _ = ttest_ind(y[X==0], y[X==1])
# Mutual Information
mi = mutual_info_regression(X.values.reshape(-1,1), y)[0]
print("Spearman:", spearman)
print("Point-Biserial:", pb)
print("t-test:", t_stat)
print("Mutual Information:", mi)
Spearman: 0.02537753340197663
Point-Biserial: 0.030026196631505037
t-test: -4.315488768612021
Mutual Information: 0.0012569075816197817
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Spearman |
0.0254 |
Monotonic (rank-based) association |
None / Extremely weak |
No meaningful monotonic trend; ranks of X and Y are essentially unrelated. |
Point–Biserial |
0.0300 |
Linear association between continuous X and binary Y |
None / Extremely weak |
Group means are almost identical; X does not separate the two classes. |
t-test (t-statistic) |
–4.3155 |
Difference in means between two groups |
Statistically significant (due to large n) |
Means differ slightly, but statistical significance comes from large sample size—not from strong effect. |
Mutual Information |
0.00126 |
Any dependency (linear + nonlinear) |
Essentially zero |
X and Y share almost no information; practically independent. |