Categorical X#

Categorical (Nominal) X → Continuous Y#

  • Nominal X: species

  • Continuous Y: petal length

from sklearn.datasets import load_iris
from scipy.stats import f_oneway, kruskal
from sklearn.feature_selection import mutual_info_regression
import pandas as pd

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target  # nominal categorical

X = df["species"]
y = df["petal length (cm)"]

# ANOVA (k groups)
groups = [y[X == g] for g in X.unique()]
anova_f, _ = f_oneway(*groups)

# Kruskal–Wallis
kw, _ = kruskal(*groups)

# Eta-squared (effect size for ANOVA)
ss_between = sum(len(g)*(g.mean()-y.mean())**2 for g in groups)
ss_total = sum((y - y.mean())**2)
eta_sq = ss_between / ss_total

# Mutual Information
mi = mutual_info_regression(X.values.reshape(-1,1), y)[0]

print("ANOVA F:", anova_f)
print("Kruskal–Wallis:", kw)
print("Eta-squared:", eta_sq)
print("Mutual Information:", mi)
ANOVA F: 1180.161182252981
Kruskal–Wallis: 130.41104857977163
Eta-squared: 0.941371719057367
Mutual Information: 0.982281157165767

Metric

Value

What It Measures

Strength / Meaning

Interpretation

ANOVA F

1180.161

Difference in group means (parametric)

Extremely large → highly significant

Group means differ dramatically; categories separate the continuous variable almost perfectly.

Kruskal–Wallis H

130.411

Difference in rank distributions (non-parametric)

Very large → highly significant

Even without normality assumptions, distributions differ strongly; confirms the ANOVA finding.

Eta-squared (η²)

0.9414

Effect size for variance explained by the groups

Very large (near 1.0)

About 94% of the total variance in the continuous variable is explained by the categorical groups — an extremely strong effect.

Mutual Information

0.9823

Overall dependency (linear + nonlinear)

Very strong

The categorical groups contain almost all the information needed to predict the continuous outcome.

Categorical (Nominal) X → Binary Y#

  • Nominal X: converted mean texture into LOW/MID/HIGH categories

  • Binary Y: cancer class (malignant/benign)

from sklearn.datasets import load_breast_cancer
from scipy.stats import chi2_contingency
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target  # binary

# Nominal X: create 3 random categories
df["X_cat"] = pd.qcut(df["mean texture"], q=3, labels=["Low","Medium","High"])

table = pd.crosstab(df["X_cat"], df["target"])

# Chi-square
chi2, _, _, _ = chi2_contingency(table)

# Cramér’s V
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n * (k - 1)))

# Mutual Information
mi = mutual_info_classif(df[["target"]], df["X_cat"].astype('category').cat.codes)[0]

print("Chi-square:", chi2)
print("Cramér’s V:", cramers_v)
print("Mutual Information:", mi)
Chi-square: 111.45982789673303
Cramér’s V: 0.44259148150655475
Mutual Information: 0.10478676277429888

Metric

Value

What It Measures

Strength

Interpretation

Chi-square

111.4598

Test of independence between two categorical variables

Statistically significant (large χ²)

The observed category frequencies differ strongly from expected; variables are not independent.

Cramér’s V

0.4426

Effect size for association (0 → none, 1 → perfect)

Moderate

There is a moderate association; categories of one variable provide useful information about the other.

Mutual Information

0.1048

Shared dependency (linear + nonlinear)

Weak–Moderate

Variables share some predictive information, but not very strong; notable dependency but not large.

Categorical (Nominal) X → Ordinal Y#

  • Nominal X: ocean proximity (encoded into 4 nominal groups)

  • Ordinal Y: housing price → low/med/high

from sklearn.datasets import fetch_california_housing
from scipy.stats import chi2_contingency, kendalltau
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
import numpy as np

# Load and create synthetic nominal X
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Create categorical nominal X
df["X_cat"] = pd.qcut(df["AveOccup"], q=4, labels=["A","B","C","D"])

# Ordinal Y
df["Y_ord"] = pd.qcut(df["MedInc"], q=3, labels=[1,2,3]).astype(int)

table = pd.crosstab(df["X_cat"], df["Y_ord"])

# Chi-square
chi2, _, _, _ = chi2_contingency(table)

# Cramér's V
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n * (k - 1)))

# Gamma (ordinal association) – approximate via Kendall (proxy)
kendall, _ = kendalltau(df["X_cat"].cat.codes, df["Y_ord"])

# Mutual Information
mi = mutual_info_classif(df["X_cat"].astype('category').cat.codes.values.reshape(-1,1), df["Y_ord"])[0]

print("Chi-square:", chi2)
print("Cramér's V:", cramers_v)
print("Kendall (proxy for Gamma):", kendall)
print("Mutual Information:", mi)
Chi-square: 1044.098687453175
Cramér's V: 0.1590380091639929
Kendall (proxy for Gamma): -0.030783163605000836
Mutual Information: 0.028348024735318944

Metric

Value

What It Measures

Strength

Interpretation

Chi-square

1044.099

Statistical test of independence for categorical variables

Very large → statistically significant

With large sample sizes, even weak associations become significant; χ² alone does not imply strong relationship.

Cramér’s V

0.1590

Effect size of association (0 → none, 1 → perfect)

Weak

Only weak association; categories share limited information.

Kendall (proxy for Gamma)

–0.0308

Directional ordinal association (if variables are ordinal)

Very weak

Almost no directional/ordinal relationship; slight negative sign is negligible.

Mutual Information

0.0283

Overall dependency (nonlinear + linear)

Extremely weak

Variables share almost no information; dependency is minimal.

Categorical (Nominal) X → Categorical (Nominal) Y#

  • Nominal X: petal length category

  • Nominal Y: species

from sklearn.datasets import load_iris
from scipy.stats import chi2_contingency
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target  # nominal

# Nominal X: bin petal width
df["X_cat"] = pd.qcut(df["petal width (cm)"], q=3, labels=["Low","Med","High"])

table = pd.crosstab(df["X_cat"], df["species"])

# Chi-square test
chi2, _, _, _ = chi2_contingency(table)

# Cramér’s V
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n*(k-1)))

# Mutual information
mi = mutual_info_classif(df["X_cat"].astype('category').cat.codes.values.reshape(-1,1),
                         df["species"])[0]

print("Chi-square:", chi2)
print("Cramér’s V:", cramers_v)
print("Mutual Information:", mi)
Chi-square: 266.3461538461538
Cramér’s V: 0.9422422792575764
Mutual Information: 0.959229763967931

Metric

Value

What It Measures

Strength / Meaning

Interpretation

Chi-square

266.346

Statistical test of independence between categorical variables

Very large → highly significant

Strong evidence that the variables are not independent.

Cramér’s V

0.9422

Effect size of association (0 = none, 1 = perfect)

Very strong

Almost perfect association; categories of one variable almost fully determine the other.

Mutual Information

0.9592

Shared dependency (linear + nonlinear)

Very strong

Variables share nearly all information; one variable almost encodes the other.

Categorical (Nominal) X → Discrete Numeric Y#

  • Nominal X: AveOccup → 3 groups

  • Discrete Numeric Y: Population (integer)

from sklearn.datasets import fetch_california_housing
from scipy.stats import f_oneway, kruskal
from sklearn.feature_selection import mutual_info_regression
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Nominal X
df["X_cat"] = pd.qcut(df["AveOccup"], q=3, labels=["Low","Mid","High"])

X = df["X_cat"]
y = df["Population"]

# ANOVA
groups = [y[X == g] for g in X.unique()]
anova_f, _ = f_oneway(*groups)

# Kruskal–Wallis
kw, _ = kruskal(*groups)

# Mutual Information
mi = mutual_info_regression(X.cat.codes.values.reshape(-1,1), y)[0]

print("ANOVA:", anova_f)
print("Kruskal–Wallis:", kw)
print("Mutual Information:", mi)
ANOVA: 336.8764325136195
Kruskal–Wallis: 1063.8261103745444
Mutual Information: 0.025269047744596485

Metric

Value

What It Measures

Strength / Meaning

Interpretation

ANOVA F

336.876

Difference in group means (parametric test)

Statistically large

Indicates that at least one group mean differs, but F alone does not indicate effect size.

Kruskal–Wallis H

1063.826

Difference in rank distributions across groups

Statistically very large

Shows that group distributions differ, but large H can occur with large samples even if effect is small.

Mutual Information

0.02527

Amount of shared dependency (linear + nonlinear)

Extremely weak

Groups explain almost none of the variation in the continuous variable; practical effect is minimal.