Binary X#

Binary X → Continuous Y#

  • Binary X: HouseAge > 30

  • Continuous Y: Median Income (MedInc)

from sklearn.datasets import fetch_california_housing
from scipy.stats import pointbiserialr, ttest_ind, mannwhitneyu
from sklearn.feature_selection import mutual_info_regression
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

df["X_bin"] = (df["HouseAge"] > 30).astype(int)
y = df["MedInc"]
X = df["X_bin"]

# Point-Biserial
pb, _ = pointbiserialr(X, y)

# t-test
t_stat, _ = ttest_ind(y[X==0], y[X==1])

# Mann-Whitney U (nonparametric)
u_stat, _ = mannwhitneyu(y[X==0], y[X==1])

# Mutual Information
mi = mutual_info_regression(X.values.reshape(-1,1), y)[0]

print("Point-Biserial:", pb)
print("t-test:", t_stat)
print("Mann–Whitney:", u_stat)
print("Mutual Information:", mi)
Point-Biserial: -0.08891425114393547
t-test: 12.824153614760043
Mann–Whitney: 59254034.5
Mutual Information: 0.009422289272053463

Metric

Value

What It Measures

Strength / Meaning

Interpretation

Point–Biserial

–0.0889

Linear association between a continuous variable and a binary variable

Very weak negative

The two groups have almost the same mean; X only very slightly lower for class 1.

t-test (t-statistic)

12.824

Difference in group means assuming normality

Statistically significant (given typical sample sizes)

Means differ statistically, but effect size is small (since r ≈ –0.09). Large t likely due to large sample size, not strong effect.

Mann–Whitney U

59,254,034.5

Difference in distributions (non-parametric)

Significance depends on group sizes

Indicates distributional difference, but does not quantify strength; with large samples even tiny effects give large U.

Mutual Information

0.00942

General dependency (linear + nonlinear)

Extremely weak

Variables share almost no information; relationship is effectively negligible.

Binary X → Binary Y#

  • Binary X: mean radius > median

  • Binary Y: cancer class (malignant / benign)

from sklearn.datasets import load_breast_cancer
from scipy.stats import chi2_contingency
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target    # binary

df["X_bin"] = (df["mean radius"] > df["mean radius"].median()).astype(int)

X = df["X_bin"]
y = df["target"]

# Phi coefficient
table = pd.crosstab(X, y)
chi2, _, _, _ = chi2_contingency(table)
phi = sqrt(chi2 / len(df))

# Chi-square
chi2_value, p, _, _ = chi2_contingency(table)

# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Phi Coefficient:", phi)
print("Chi-Square:", chi2_value)
print("Mutual Information:", mi)
Phi Coefficient: 0.644740755602814
Chi-Square: 236.52797526117857
Mutual Information: 0.22088305825323062

Metric

Value

What It Measures

Strength

Interpretation

Phi Coefficient

0.6447

Association between two binary variables

Strong positive

Strong relationship; when one binary variable is 1, the other is likely also 1 (or 0 with 0). Consistent directional association.

Chi-Square

236.528

Test of independence for categorical variables

Very large → statistically significant

Observed frequencies differ sharply from expected frequencies. Strong evidence of dependency between categories.

Mutual Information

0.2209

Information shared between the two variables (linear + nonlinear)

Moderate

The variables share a meaningful amount of information; knowing one reduces uncertainty about the other.

Binary X → Ordinal Y#

  • Binary X: Median Income > 3

  • Ordinal Y: target (price) converted to 3-level ordinal bins: Low, Medium, High

from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, kendalltau
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target

df["X_bin"] = (df["MedInc"] > 3).astype(int)

# Ordinal Y
df["Y_ord"] = pd.qcut(df["target"], q=3, labels=[1,2,3]).astype(int)

X = df["X_bin"]
y = df["Y_ord"]

# Spearman
spearman, _ = spearmanr(X, y)

# Kendall
kendall, _ = kendalltau(X, y)

# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Kendall:", kendall)
print("Mutual Information:", mi)
Spearman: 0.49811001629852997
Kendall: 0.46962263368309787
Mutual Information: 0.13921068369562795

Metric

Value

What It Measures

Strength

Interpretation

Spearman

0.4981

Monotonic (rank-based) correlation

Moderate–Strong

As X increases, Y tends to increase in a consistent ranked pattern; clear upward trend.

Kendall

0.4696

Pairwise concordance (rank agreement)

Strong

Most observation pairs move in the same direction; high directional agreement between X and Y.

Mutual Information

0.1392

Overall dependency (linear + nonlinear)

Moderate

X carries meaningful predictive information about Y; noticeable shared dependency.

Binary X → Categorical Nominal Y#

  • Binary X: petal length > 2.5

  • Nominal Y: species (3 classes)

from sklearn.datasets import load_iris
from scipy.stats import chi2_contingency
import pandas as pd
from sklearn.feature_selection import mutual_info_classif
from math import sqrt

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target

df["X_bin"] = (df["petal length (cm)"] > 2.5).astype(int)

X = df["X_bin"]
y = df["species"]

# Cramér's V
table = pd.crosstab(X, y)
chi2, _, _, _ = chi2_contingency(table)
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n * (k-1)))

# Chi-square
chi2_value, p, _, _ = chi2_contingency(table)

# Mutual Information
mi = mutual_info_classif(X.values.reshape(-1,1), y)[0]

print("Cramér’s V:", cramers_v)
print("Chi-Square:", chi2_value)
print("Mutual Information:", mi)
Cramér’s V: 1.0
Chi-Square: 150.0
Mutual Information: 0.7281284489676516

Metric

Value

What It Measures

Strength

Interpretation

Cramér’s V

1.0

Strength of association between two categorical variables

Perfect association

The categories align perfectly; one variable fully determines the other.

Chi-Square

150.0

Test of independence (categorical–categorical)

Very large → statistically significant

Observed frequencies differ sharply from expected; confirms strong dependence.

Mutual Information

0.7281

Amount of shared information (0 = none, 1 = high)

High

Variables share substantial information; knowing one reduces uncertainty about the other.

Binary X → Discrete Numeric Y#

  • Binary X: Bedrooms > median

  • Discrete numeric Y: Population (integer)

from sklearn.datasets import fetch_california_housing
from scipy.stats import spearmanr, pointbiserialr, ttest_ind
from sklearn.feature_selection import mutual_info_regression
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

df["X_bin"] = (df["AveBedrms"] > df["AveBedrms"].median()).astype(int)
y = df["Population"]
X = df["X_bin"]

# Spearman
spearman, _ = spearmanr(X, y)

# Point-Biserial
pb, _ = pointbiserialr(X, y)

# t-test (due to binary X)
t_stat, _ = ttest_ind(y[X==0], y[X==1])

# Mutual Information
mi = mutual_info_regression(X.values.reshape(-1,1), y)[0]

print("Spearman:", spearman)
print("Point-Biserial:", pb)
print("t-test:", t_stat)
print("Mutual Information:", mi)
Spearman: 0.02537753340197663
Point-Biserial: 0.030026196631505037
t-test: -4.315488768612021
Mutual Information: 0.0012569075816197817

Metric

Value

What It Measures

Strength

Interpretation

Spearman

0.0254

Monotonic (rank-based) association

None / Extremely weak

No meaningful monotonic trend; ranks of X and Y are essentially unrelated.

Point–Biserial

0.0300

Linear association between continuous X and binary Y

None / Extremely weak

Group means are almost identical; X does not separate the two classes.

t-test (t-statistic)

–4.3155

Difference in means between two groups

Statistically significant (due to large n)

Means differ slightly, but statistical significance comes from large sample size—not from strong effect.

Mutual Information

0.00126

Any dependency (linear + nonlinear)

Essentially zero

X and Y share almost no information; practically independent.