Naive Bayes

Contents

Naive Bayes #

Naive Bayes is a supervised machine learning algorithm used for classification.

Bayes’ Theorem #

\[ P(y|X) = \frac{P(X|y) \cdot P(y)}{P(X)} \]

\(P(y|X)\): Probability of class \(y\) given features \(X\) (posterior).
\(P(X|y)\): Probability of features given class \(y\) (likelihood).
\(P(y)\): Probability of class \(y\) (prior).
\(P(X)\): Probability of features (evidence, same for all classes).

Naive Assumption #

It assumes independence among features, so:

\[ P(X|y) = \prod_{i=1}^n P(x_i | y) \]

This makes computation fast and simple.

Types #

Gaussian Naive Bayes → continuous features, assumes normal distribution.
Multinomial Naive Bayes → discrete counts (e.g., word counts in text).
Bernoulli Naive Bayes → binary features (e.g., presence/absence of a word).

Workflow #

Compute prior probabilities \(P(y)\) from data.
Estimate conditional probabilities \(P(x_i|y)\) for each feature.
Apply Bayes’ theorem to classify a new instance into the class with the highest posterior.

Example #

Email classification:

Features: words like lottery, win, money.
If these words appear often in spam, \(P(\text{spam} | X)\) becomes high.
Classify as spam if posterior for spam > ham.

Pros #

Very fast, works on large datasets.
Performs well in text/NLP tasks (spam, sentiment).
Easy to implement.

Cons #

Independence assumption rarely holds.
Fails with highly correlated features.
Zero-frequency problem (fixed by Laplace smoothing).

Naive Bayes = Bayes’ theorem + independence assumption, used for fast probabilistic classification.