Variants of NB#

1. Gaussian Naïve Bayes#

  • Assumes that continuous features follow a normal (Gaussian) distribution within each class.

  • Likelihood:

    \[ P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma_{y,i}^2}} \exp\left(-\frac{(x_i - \mu_{y,i})^2}{2\sigma_{y,i}^2}\right) \]
    • \(\mu_{y,i}\): mean of feature \(i\) for class \(y\)

    • \(\sigma_{y,i}^2\): variance

  • Use case: Continuous numeric data (e.g., medical measurements, sensor data).


2. Multinomial Naïve Bayes#

  • Assumes features are discrete counts (e.g., word counts in text).

  • Likelihood:

    \[ P(x \mid y) = \frac{( \sum_i x_i )!}{\prod_i x_i!} \prod_{i=1}^n P(x_i \mid y)^{x_i} \]
  • Use case: Text classification (spam detection, sentiment analysis) with bag-of-words or TF-IDF counts.


3. Bernoulli Naïve Bayes#

  • Features are binary (0 or 1: present/absent).

  • Likelihood:

    \[ P(x_i \mid y) = P_{i,y}^{x_i} (1-P_{i,y})^{1-x_i} \]
  • Use case: Text classification where only word presence matters (not frequency).


4. Complement Naïve Bayes#

  • A variation of Multinomial NB designed for imbalanced datasets.

  • Uses complement of each class to estimate likelihoods, reducing bias toward majority class.

  • Use case: Text classification with severe class imbalance.


5. Categorical Naïve Bayes (aka Multivariate Bernoulli NB in sklearn ≥0.20)#

  • Handles categorical features with multiple categories (not just binary).

  • Uses category probabilities per feature.

  • Use case: Datasets with categorical variables (e.g., “color = red/green/blue”).


6. Kernel Density Estimation (KDE) Naïve Bayes#

  • Instead of assuming Gaussian distribution, estimates feature likelihoods using non-parametric density estimation.

  • More flexible but computationally heavier.

  • Use case: Continuous features that are not Gaussian-shaped.


Variant

Data Type

Distribution Assumption

Common Use Case

Gaussian NB

Continuous

Normal (Gaussian)

Medical, sensor data

Multinomial NB

Count-based

Multinomial

Text (word counts, TF-IDF)

Bernoulli NB

Binary

Bernoulli

Text (word presence/absence)

Complement NB

Count-based

Multinomial (complement class)

Imbalanced text datasets

Categorical NB

Categorical

Categorical distribution

Tabular categorical data

KDE NB

Continuous

Non-parametric (KDE)

Complex continuous features

# Demonstration of Naive Bayes Variants in scikit-learn
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris, fetch_20newsgroups, make_classification
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, CategoricalNB, ComplementNB
from sklearn.metrics import accuracy_score, classification_report

results = {}

# 1. Gaussian Naive Bayes (Iris dataset)
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
results['GaussianNB'] = accuracy_score(y_test, y_pred)

# 2. Multinomial Naive Bayes (text classification)
docs = ["I love Python", "Python is great for machine learning", "I dislike bugs", "Bugs are annoying"]
labels = [1, 1, 0, 0]  # 1=positive, 0=negative
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(docs)
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.5, random_state=42)
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
y_pred = mnb.predict(X_test)
results['MultinomialNB'] = accuracy_score(y_test, y_pred)

# 3. Bernoulli Naive Bayes (binary text presence/absence)
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
y_pred = bnb.predict(X_test)
results['BernoulliNB'] = accuracy_score(y_test, y_pred)

# 4. Complement Naive Bayes (good for imbalanced data)
cnb = ComplementNB()
cnb.fit(X_train, y_train)
y_pred = cnb.predict(X_test)
results['ComplementNB'] = accuracy_score(y_test, y_pred)

# 5. Categorical Naive Bayes (on synthetic categorical dataset)
# Generate categorical-like features (values 0-3)
X, y = make_classification(n_samples=200, n_features=3, n_informative=3, n_redundant=0, random_state=42)
X = np.digitize(X, bins=[-1, 0, 1, 2])  # discretize features into bins (categories)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
catnb = CategoricalNB()
catnb.fit(X_train, y_train)
y_pred = catnb.predict(X_test)
results['CategoricalNB'] = accuracy_score(y_test, y_pred)

results
{'GaussianNB': 0.9777777777777777,
 'MultinomialNB': 1.0,
 'BernoulliNB': 1.0,
 'ComplementNB': 1.0,
 'CategoricalNB': 0.85}

Variant

Use Case

Dataset Used

Accuracy

GaussianNB

Continuous features (normally distributed)

Iris dataset

97.8%

MultinomialNB

Discrete counts (text classification, word counts)

Small text dataset

100%

BernoulliNB

Binary features (word presence/absence)

Same text dataset

100%

ComplementNB

Handles imbalanced text data better

Same text dataset

100%

CategoricalNB

Purely categorical data

Synthetic categorical dataset

85%