One-vs-Rest (OVR) Logistic Regression#
1. The Problem#
Standard logistic regression handles binary classification: two classes only.
Many real-world problems are multi-class (3 or more classes), e.g., classifying animals as Cat, Dog, or Rabbit.
We need a way to extend logistic regression to handle multiple classes.
2. One-vs-Rest (OVR) Strategy#
OVR (also called One-vs-All) converts a multi-class problem into multiple binary classification problems:
Suppose there are \(K\) classes: \(C_1, C_2, ..., C_K\).
For each class \(C_k\), train a binary logistic regression classifier:
Treat \(C_k\) as the positive class (1).
Treat all other classes as negative class (0).
Example with 3 classes (Cat, Dog, Rabbit):
Classifier |
Positive |
Negative |
|---|---|---|
M1 |
Cat |
Dog, Rabbit |
M2 |
Dog |
Cat, Rabbit |
M3 |
Rabbit |
Cat, Dog |
3. Training Phase#
Each binary model \(M_k\) is trained independently.
Input features remain the same for all models.
Use one-hot encoding for outputs:
Class |
One-hot |
|---|---|
Cat |
[1,0,0] |
Dog |
[0,1,0] |
Rabbit |
[0,0,1] |
Each model predicts the probability that a sample belongs to its respective class.
4. Prediction Phase#
For a new data point:
Pass the input to all K models.
Each model outputs a probability that the point belongs to its positive class.
Example probabilities:
Model |
Probability |
|---|---|
M1 |
0.25 |
M2 |
0.20 |
M3 |
0.55 |
Choose the class with the highest probability → here, Rabbit (class 3).
5. Advantages of OVR#
Simple to implement.
Works with any binary classifier (logistic regression, SVM, etc.).
Efficient when the number of classes is not very large.
6. Disadvantages#
Probabilities from different classifiers may not be well-calibrated.
Can be biased if one class is much smaller than the “rest.”
Less accurate than One-vs-One in some cases.
✅ Summary#
OVR Logistic Regression works by:
Splitting a multi-class problem into K binary problems.
Training a separate logistic regression for each class.
Predicting the class with the highest probability across all models.
Example Problem Statement#
Problem: You are building a model to classify types of fruits based on two features:
f1 = Weight (grams)
f2 = Color Score (0–10 scale)
Classes:
Apple
Banana
Cherry
Training Data:
Fruit |
f1 (Weight) |
f2 (Color Score) |
|---|---|---|
Apple |
150 |
8 |
Apple |
170 |
7 |
Banana |
120 |
4 |
Banana |
130 |
5 |
Cherry |
10 |
9 |
Cherry |
15 |
8 |
We want to predict the fruit type given f1 and f2.
Step 1: One-vs-Rest (OVR) Setup#
We have 3 classes, so we create 3 binary classifiers:
M1 (Apple vs Rest):
Positive: Apple
Negative: Banana, Cherry
M2 (Banana vs Rest):
Positive: Banana
Negative: Apple, Cherry
M3 (Cherry vs Rest):
Positive: Cherry
Negative: Apple, Banana
Step 2: One-hot Encoding of Target#
Fruit |
One-hot (Apple, Banana, Cherry) |
|---|---|
Apple |
[1, 0, 0] |
Banana |
[0, 1, 0] |
Cherry |
[0, 0, 1] |
Each classifier uses its corresponding column as the target.
Step 3: Training Binary Models#
Each binary logistic regression model is trained independently:
Input:
[f1, f2]Output: probability of being the positive class
Step 4: Prediction Example#
Test data:
f1 = 140, f2 = 6
Step 4a: Predict probabilities using each classifier
Model |
Class |
Probability |
|---|---|---|
M1 |
Apple |
0.4 |
M2 |
Banana |
0.35 |
M3 |
Cherry |
0.25 |
Step 4b: Choose class with highest probability
Max probability = 0.4 → Apple
So the predicted class is Apple.
Step 5: Summary#
OVR breaks multi-class classification into multiple binary logistic regressions.
Each model outputs a probability for its class.
Final prediction = class with highest probability.
# Import libraries
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore")
# Sample data (Fruit dataset)
X = np.array([
[150, 8], # Apple
[170, 7], # Apple
[120, 4], # Banana
[130, 5], # Banana
[10, 9], # Cherry
[15, 8] # Cherry
])
y = np.array(['Apple', 'Apple', 'Banana', 'Banana', 'Cherry', 'Cherry'])
# Encode labels to integers
le = LabelEncoder()
y_encoded = le.fit_transform(y) # Apple=0, Banana=1, Cherry=2
# Create OVR Logistic Regression model
model = LogisticRegression(multi_class='ovr', solver='lbfgs')
model.fit(X, y_encoded)
# Test data
X_test = np.array([
[140, 6], # Expected: Apple
[12, 8], # Expected: Cherry
[125, 5] # Expected: Banana
])
# Predict probabilities for each class
probs = model.predict_proba(X_test)
predictions = model.predict(X_test)
# Convert predicted labels back to original class names
predicted_classes = le.inverse_transform(predictions)
# Print results
for i, x in enumerate(X_test):
print(f"Test Data: {x}")
print(f"Predicted Probabilities: {probs[i]}")
print(f"Predicted Class: {predicted_classes[i]}\n")
Test Data: [140 6]
Predicted Probabilities: [5.19151260e-01 4.80777962e-01 7.07778904e-05]
Predicted Class: Apple
Test Data: [12 8]
Predicted Probabilities: [3.41751899e-22 6.85793166e-02 9.31420683e-01]
Predicted Class: Cherry
Test Data: [125 5]
Predicted Probabilities: [3.84060828e-03 9.95473944e-01 6.85447453e-04]
Predicted Class: Banana