Naive Bayes#

Naive Bayes is a supervised machine learning algorithm used for classification.


Bayes’ Theorem#

\[ P(y|X) = \frac{P(X|y) \cdot P(y)}{P(X)} \]
  • \(P(y|X)\): Probability of class \(y\) given features \(X\) (posterior).

  • \(P(X|y)\): Probability of features given class \(y\) (likelihood).

  • \(P(y)\): Probability of class \(y\) (prior).

  • \(P(X)\): Probability of features (evidence, same for all classes).


Naive Assumption#

It assumes independence among features, so:

\[ P(X|y) = \prod_{i=1}^n P(x_i | y) \]

This makes computation fast and simple.


Types#

  • Gaussian Naive Bayes → continuous features, assumes normal distribution.

  • Multinomial Naive Bayes → discrete counts (e.g., word counts in text).

  • Bernoulli Naive Bayes → binary features (e.g., presence/absence of a word).


Workflow#

  1. Compute prior probabilities \(P(y)\) from data.

  2. Estimate conditional probabilities \(P(x_i|y)\) for each feature.

  3. Apply Bayes’ theorem to classify a new instance into the class with the highest posterior.


Example#

Email classification:

  • Features: words like lottery, win, money.

  • If these words appear often in spam, \(P(\text{spam} | X)\) becomes high.

  • Classify as spam if posterior for spam > ham.


Pros#

  • Very fast, works on large datasets.

  • Performs well in text/NLP tasks (spam, sentiment).

  • Easy to implement.

Cons#

  • Independence assumption rarely holds.

  • Fails with highly correlated features.

  • Zero-frequency problem (fixed by Laplace smoothing).


Naive Bayes = Bayes’ theorem + independence assumption, used for fast probabilistic classification.

Click here for Sections