Naive Bayes#
Naive Bayes is a supervised machine learning algorithm used for classification.
Bayes’ Theorem#
\(P(y|X)\): Probability of class \(y\) given features \(X\) (posterior).
\(P(X|y)\): Probability of features given class \(y\) (likelihood).
\(P(y)\): Probability of class \(y\) (prior).
\(P(X)\): Probability of features (evidence, same for all classes).
Naive Assumption#
It assumes independence among features, so:
This makes computation fast and simple.
Types#
Gaussian Naive Bayes → continuous features, assumes normal distribution.
Multinomial Naive Bayes → discrete counts (e.g., word counts in text).
Bernoulli Naive Bayes → binary features (e.g., presence/absence of a word).
Workflow#
Compute prior probabilities \(P(y)\) from data.
Estimate conditional probabilities \(P(x_i|y)\) for each feature.
Apply Bayes’ theorem to classify a new instance into the class with the highest posterior.
Example#
Email classification:
Features: words like lottery, win, money.
If these words appear often in spam, \(P(\text{spam} | X)\) becomes high.
Classify as spam if posterior for spam > ham.
Pros#
Very fast, works on large datasets.
Performs well in text/NLP tasks (spam, sentiment).
Easy to implement.
Cons#
Independence assumption rarely holds.
Fails with highly correlated features.
Zero-frequency problem (fixed by Laplace smoothing).
Naive Bayes = Bayes’ theorem + independence assumption, used for fast probabilistic classification.