Categorical X#
Categorical (Nominal) X → Continuous Y#
Nominal X: species
Continuous Y: petal length
from sklearn.datasets import load_iris
from scipy.stats import f_oneway, kruskal
from sklearn.feature_selection import mutual_info_regression
import pandas as pd
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target # nominal categorical
X = df["species"]
y = df["petal length (cm)"]
# ANOVA (k groups)
groups = [y[X == g] for g in X.unique()]
anova_f, _ = f_oneway(*groups)
# Kruskal–Wallis
kw, _ = kruskal(*groups)
# Eta-squared (effect size for ANOVA)
ss_between = sum(len(g)*(g.mean()-y.mean())**2 for g in groups)
ss_total = sum((y - y.mean())**2)
eta_sq = ss_between / ss_total
# Mutual Information
mi = mutual_info_regression(X.values.reshape(-1,1), y)[0]
print("ANOVA F:", anova_f)
print("Kruskal–Wallis:", kw)
print("Eta-squared:", eta_sq)
print("Mutual Information:", mi)
ANOVA F: 1180.161182252981
Kruskal–Wallis: 130.41104857977163
Eta-squared: 0.941371719057367
Mutual Information: 0.982281157165767
Metric |
Value |
What It Measures |
Strength / Meaning |
Interpretation |
|---|---|---|---|---|
ANOVA F |
1180.161 |
Difference in group means (parametric) |
Extremely large → highly significant |
Group means differ dramatically; categories separate the continuous variable almost perfectly. |
Kruskal–Wallis H |
130.411 |
Difference in rank distributions (non-parametric) |
Very large → highly significant |
Even without normality assumptions, distributions differ strongly; confirms the ANOVA finding. |
Eta-squared (η²) |
0.9414 |
Effect size for variance explained by the groups |
Very large (near 1.0) |
About 94% of the total variance in the continuous variable is explained by the categorical groups — an extremely strong effect. |
Mutual Information |
0.9823 |
Overall dependency (linear + nonlinear) |
Very strong |
The categorical groups contain almost all the information needed to predict the continuous outcome. |
Categorical (Nominal) X → Binary Y#
Nominal X: converted mean texture into LOW/MID/HIGH categories
Binary Y: cancer class (malignant/benign)
from sklearn.datasets import load_breast_cancer
from scipy.stats import chi2_contingency
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target # binary
# Nominal X: create 3 random categories
df["X_cat"] = pd.qcut(df["mean texture"], q=3, labels=["Low","Medium","High"])
table = pd.crosstab(df["X_cat"], df["target"])
# Chi-square
chi2, _, _, _ = chi2_contingency(table)
# Cramér’s V
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n * (k - 1)))
# Mutual Information
mi = mutual_info_classif(df[["target"]], df["X_cat"].astype('category').cat.codes)[0]
print("Chi-square:", chi2)
print("Cramér’s V:", cramers_v)
print("Mutual Information:", mi)
Chi-square: 111.45982789673303
Cramér’s V: 0.44259148150655475
Mutual Information: 0.10478676277429888
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Chi-square |
111.4598 |
Test of independence between two categorical variables |
Statistically significant (large χ²) |
The observed category frequencies differ strongly from expected; variables are not independent. |
Cramér’s V |
0.4426 |
Effect size for association (0 → none, 1 → perfect) |
Moderate |
There is a moderate association; categories of one variable provide useful information about the other. |
Mutual Information |
0.1048 |
Shared dependency (linear + nonlinear) |
Weak–Moderate |
Variables share some predictive information, but not very strong; notable dependency but not large. |
Categorical (Nominal) X → Ordinal Y#
Nominal X: ocean proximity (encoded into 4 nominal groups)
Ordinal Y: housing price → low/med/high
from sklearn.datasets import fetch_california_housing
from scipy.stats import chi2_contingency, kendalltau
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
import numpy as np
# Load and create synthetic nominal X
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Create categorical nominal X
df["X_cat"] = pd.qcut(df["AveOccup"], q=4, labels=["A","B","C","D"])
# Ordinal Y
df["Y_ord"] = pd.qcut(df["MedInc"], q=3, labels=[1,2,3]).astype(int)
table = pd.crosstab(df["X_cat"], df["Y_ord"])
# Chi-square
chi2, _, _, _ = chi2_contingency(table)
# Cramér's V
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n * (k - 1)))
# Gamma (ordinal association) – approximate via Kendall (proxy)
kendall, _ = kendalltau(df["X_cat"].cat.codes, df["Y_ord"])
# Mutual Information
mi = mutual_info_classif(df["X_cat"].astype('category').cat.codes.values.reshape(-1,1), df["Y_ord"])[0]
print("Chi-square:", chi2)
print("Cramér's V:", cramers_v)
print("Kendall (proxy for Gamma):", kendall)
print("Mutual Information:", mi)
Chi-square: 1044.098687453175
Cramér's V: 0.1590380091639929
Kendall (proxy for Gamma): -0.030783163605000836
Mutual Information: 0.028348024735318944
Metric |
Value |
What It Measures |
Strength |
Interpretation |
|---|---|---|---|---|
Chi-square |
1044.099 |
Statistical test of independence for categorical variables |
Very large → statistically significant |
With large sample sizes, even weak associations become significant; χ² alone does not imply strong relationship. |
Cramér’s V |
0.1590 |
Effect size of association (0 → none, 1 → perfect) |
Weak |
Only weak association; categories share limited information. |
Kendall (proxy for Gamma) |
–0.0308 |
Directional ordinal association (if variables are ordinal) |
Very weak |
Almost no directional/ordinal relationship; slight negative sign is negligible. |
Mutual Information |
0.0283 |
Overall dependency (nonlinear + linear) |
Extremely weak |
Variables share almost no information; dependency is minimal. |
Categorical (Nominal) X → Categorical (Nominal) Y#
Nominal X: petal length category
Nominal Y: species
from sklearn.datasets import load_iris
from scipy.stats import chi2_contingency
from math import sqrt
from sklearn.feature_selection import mutual_info_classif
import pandas as pd
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["species"] = data.target # nominal
# Nominal X: bin petal width
df["X_cat"] = pd.qcut(df["petal width (cm)"], q=3, labels=["Low","Med","High"])
table = pd.crosstab(df["X_cat"], df["species"])
# Chi-square test
chi2, _, _, _ = chi2_contingency(table)
# Cramér’s V
n = table.sum().sum()
k = min(table.shape)
cramers_v = sqrt(chi2 / (n*(k-1)))
# Mutual information
mi = mutual_info_classif(df["X_cat"].astype('category').cat.codes.values.reshape(-1,1),
df["species"])[0]
print("Chi-square:", chi2)
print("Cramér’s V:", cramers_v)
print("Mutual Information:", mi)
Chi-square: 266.3461538461538
Cramér’s V: 0.9422422792575764
Mutual Information: 0.959229763967931
Metric |
Value |
What It Measures |
Strength / Meaning |
Interpretation |
|---|---|---|---|---|
Chi-square |
266.346 |
Statistical test of independence between categorical variables |
Very large → highly significant |
Strong evidence that the variables are not independent. |
Cramér’s V |
0.9422 |
Effect size of association (0 = none, 1 = perfect) |
Very strong |
Almost perfect association; categories of one variable almost fully determine the other. |
Mutual Information |
0.9592 |
Shared dependency (linear + nonlinear) |
Very strong |
Variables share nearly all information; one variable almost encodes the other. |
Categorical (Nominal) X → Discrete Numeric Y#
Nominal X: AveOccup → 3 groups
Discrete Numeric Y: Population (integer)
from sklearn.datasets import fetch_california_housing
from scipy.stats import f_oneway, kruskal
from sklearn.feature_selection import mutual_info_regression
import pandas as pd
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Nominal X
df["X_cat"] = pd.qcut(df["AveOccup"], q=3, labels=["Low","Mid","High"])
X = df["X_cat"]
y = df["Population"]
# ANOVA
groups = [y[X == g] for g in X.unique()]
anova_f, _ = f_oneway(*groups)
# Kruskal–Wallis
kw, _ = kruskal(*groups)
# Mutual Information
mi = mutual_info_regression(X.cat.codes.values.reshape(-1,1), y)[0]
print("ANOVA:", anova_f)
print("Kruskal–Wallis:", kw)
print("Mutual Information:", mi)
ANOVA: 336.8764325136195
Kruskal–Wallis: 1063.8261103745444
Mutual Information: 0.025269047744596485
Metric |
Value |
What It Measures |
Strength / Meaning |
Interpretation |
|---|---|---|---|---|
ANOVA F |
336.876 |
Difference in group means (parametric test) |
Statistically large |
Indicates that at least one group mean differs, but F alone does not indicate effect size. |
Kruskal–Wallis H |
1063.826 |
Difference in rank distributions across groups |
Statistically very large |
Shows that group distributions differ, but large H can occur with large samples even if effect is small. |
Mutual Information |
0.02527 |
Amount of shared dependency (linear + nonlinear) |
Extremely weak |
Groups explain almost none of the variation in the continuous variable; practical effect is minimal. |