Variants of NB#
1. Gaussian Naïve Bayes#
Assumes that continuous features follow a normal (Gaussian) distribution within each class.
Likelihood:
\[ P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma_{y,i}^2}} \exp\left(-\frac{(x_i - \mu_{y,i})^2}{2\sigma_{y,i}^2}\right) \]\(\mu_{y,i}\): mean of feature \(i\) for class \(y\)
\(\sigma_{y,i}^2\): variance
Use case: Continuous numeric data (e.g., medical measurements, sensor data).
2. Multinomial Naïve Bayes#
Assumes features are discrete counts (e.g., word counts in text).
Likelihood:
\[ P(x \mid y) = \frac{( \sum_i x_i )!}{\prod_i x_i!} \prod_{i=1}^n P(x_i \mid y)^{x_i} \]Use case: Text classification (spam detection, sentiment analysis) with bag-of-words or TF-IDF counts.
3. Bernoulli Naïve Bayes#
Features are binary (0 or 1: present/absent).
Likelihood:
\[ P(x_i \mid y) = P_{i,y}^{x_i} (1-P_{i,y})^{1-x_i} \]Use case: Text classification where only word presence matters (not frequency).
4. Complement Naïve Bayes#
A variation of Multinomial NB designed for imbalanced datasets.
Uses complement of each class to estimate likelihoods, reducing bias toward majority class.
Use case: Text classification with severe class imbalance.
5. Categorical Naïve Bayes (aka Multivariate Bernoulli NB in sklearn ≥0.20)#
Handles categorical features with multiple categories (not just binary).
Uses category probabilities per feature.
Use case: Datasets with categorical variables (e.g., “color = red/green/blue”).
6. Kernel Density Estimation (KDE) Naïve Bayes#
Instead of assuming Gaussian distribution, estimates feature likelihoods using non-parametric density estimation.
More flexible but computationally heavier.
Use case: Continuous features that are not Gaussian-shaped.
Variant |
Data Type |
Distribution Assumption |
Common Use Case |
|---|---|---|---|
Gaussian NB |
Continuous |
Normal (Gaussian) |
Medical, sensor data |
Multinomial NB |
Count-based |
Multinomial |
Text (word counts, TF-IDF) |
Bernoulli NB |
Binary |
Bernoulli |
Text (word presence/absence) |
Complement NB |
Count-based |
Multinomial (complement class) |
Imbalanced text datasets |
Categorical NB |
Categorical |
Categorical distribution |
Tabular categorical data |
KDE NB |
Continuous |
Non-parametric (KDE) |
Complex continuous features |
# Demonstration of Naive Bayes Variants in scikit-learn
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris, fetch_20newsgroups, make_classification
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, CategoricalNB, ComplementNB
from sklearn.metrics import accuracy_score, classification_report
results = {}
# 1. Gaussian Naive Bayes (Iris dataset)
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
results['GaussianNB'] = accuracy_score(y_test, y_pred)
# 2. Multinomial Naive Bayes (text classification)
docs = ["I love Python", "Python is great for machine learning", "I dislike bugs", "Bugs are annoying"]
labels = [1, 1, 0, 0] # 1=positive, 0=negative
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(docs)
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.5, random_state=42)
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
y_pred = mnb.predict(X_test)
results['MultinomialNB'] = accuracy_score(y_test, y_pred)
# 3. Bernoulli Naive Bayes (binary text presence/absence)
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
y_pred = bnb.predict(X_test)
results['BernoulliNB'] = accuracy_score(y_test, y_pred)
# 4. Complement Naive Bayes (good for imbalanced data)
cnb = ComplementNB()
cnb.fit(X_train, y_train)
y_pred = cnb.predict(X_test)
results['ComplementNB'] = accuracy_score(y_test, y_pred)
# 5. Categorical Naive Bayes (on synthetic categorical dataset)
# Generate categorical-like features (values 0-3)
X, y = make_classification(n_samples=200, n_features=3, n_informative=3, n_redundant=0, random_state=42)
X = np.digitize(X, bins=[-1, 0, 1, 2]) # discretize features into bins (categories)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
catnb = CategoricalNB()
catnb.fit(X_train, y_train)
y_pred = catnb.predict(X_test)
results['CategoricalNB'] = accuracy_score(y_test, y_pred)
results
{'GaussianNB': 0.9777777777777777,
'MultinomialNB': 1.0,
'BernoulliNB': 1.0,
'ComplementNB': 1.0,
'CategoricalNB': 0.85}
Variant |
Use Case |
Dataset Used |
Accuracy |
|---|---|---|---|
GaussianNB |
Continuous features (normally distributed) |
Iris dataset |
97.8% |
MultinomialNB |
Discrete counts (text classification, word counts) |
Small text dataset |
100% |
BernoulliNB |
Binary features (word presence/absence) |
Same text dataset |
100% |
ComplementNB |
Handles imbalanced text data better |
Same text dataset |
100% |
CategoricalNB |
Purely categorical data |
Synthetic categorical dataset |
85% |