One-vs-Rest (OVR) Logistic Regression

One-vs-Rest (OVR) Logistic Regression#

1. The Problem#

Standard logistic regression handles binary classification: two classes only.
Many real-world problems are multi-class (3 or more classes), e.g., classifying animals as Cat, Dog, or Rabbit.
We need a way to extend logistic regression to handle multiple classes.

2. One-vs-Rest (OVR) Strategy#

OVR (also called One-vs-All) converts a multi-class problem into multiple binary classification problems:

Suppose there are \(K\) classes: \(C_1, C_2, ..., C_K\).
For each class \(C_k\), train a binary logistic regression classifier:
- Treat \(C_k\) as the positive class (1).
- Treat all other classes as negative class (0).

Example with 3 classes (Cat, Dog, Rabbit):

Classifier	Positive	Negative
M1	Cat	Dog, Rabbit
M2	Dog	Cat, Rabbit
M3	Rabbit	Cat, Dog

3. Training Phase#

Each binary model \(M_k\) is trained independently.
Input features remain the same for all models.
Use one-hot encoding for outputs:

Class	One-hot
Cat	[1,0,0]
Dog	[0,1,0]
Rabbit	[0,0,1]

Each model predicts the probability that a sample belongs to its respective class.

4. Prediction Phase#

For a new data point:

Pass the input to all K models.
Each model outputs a probability that the point belongs to its positive class.
Example probabilities:

Model	Probability
M1	0.25
M2	0.20
M3	0.55

Choose the class with the highest probability → here, Rabbit (class 3).

5. Advantages of OVR#

Simple to implement.
Works with any binary classifier (logistic regression, SVM, etc.).
Efficient when the number of classes is not very large.

6. Disadvantages#

Probabilities from different classifiers may not be well-calibrated.
Can be biased if one class is much smaller than the “rest.”
Less accurate than One-vs-One in some cases.

✅ Summary#

OVR Logistic Regression works by:

Splitting a multi-class problem into K binary problems.
Training a separate logistic regression for each class.
Predicting the class with the highest probability across all models.

Example Problem Statement#

Problem: You are building a model to classify types of fruits based on two features:

f1 = Weight (grams)
f2 = Color Score (0–10 scale)

Classes:

Apple
Banana
Cherry

Training Data:

Fruit	f1 (Weight)	f2 (Color Score)
Apple	150	8
Apple	170	7
Banana	120	4
Banana	130	5
Cherry	10	9
Cherry	15	8

We want to predict the fruit type given f1 and f2.

Step 1: One-vs-Rest (OVR) Setup#

We have 3 classes, so we create 3 binary classifiers:

M1 (Apple vs Rest):
- Positive: Apple
- Negative: Banana, Cherry
M2 (Banana vs Rest):
- Positive: Banana
- Negative: Apple, Cherry
M3 (Cherry vs Rest):
- Positive: Cherry
- Negative: Apple, Banana

Step 2: One-hot Encoding of Target#

Fruit	One-hot (Apple, Banana, Cherry)
Apple	[1, 0, 0]
Banana	[0, 1, 0]
Cherry	[0, 0, 1]

Each classifier uses its corresponding column as the target.

Step 3: Training Binary Models#

Each binary logistic regression model is trained independently:
- Input: [f1, f2]
- Output: probability of being the positive class

Step 4: Prediction Example#

Test data:

f1 = 140, f2 = 6

Step 4a: Predict probabilities using each classifier

Model	Class	Probability
M1	Apple	0.4
M2	Banana	0.35
M3	Cherry	0.25

Step 4b: Choose class with highest probability

Max probability = 0.4 → Apple

So the predicted class is Apple.

Step 5: Summary#

OVR breaks multi-class classification into multiple binary logistic regressions.
Each model outputs a probability for its class.
Final prediction = class with highest probability.

# Import libraries
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
import warnings

warnings.filterwarnings("ignore")
# Sample data (Fruit dataset)
X = np.array([
    [150, 8],   # Apple
    [170, 7],   # Apple
    [120, 4],   # Banana
    [130, 5],   # Banana
    [10, 9],    # Cherry
    [15, 8]     # Cherry
])

y = np.array(['Apple', 'Apple', 'Banana', 'Banana', 'Cherry', 'Cherry'])

# Encode labels to integers
le = LabelEncoder()
y_encoded = le.fit_transform(y)  # Apple=0, Banana=1, Cherry=2

# Create OVR Logistic Regression model
model = LogisticRegression(multi_class='ovr', solver='lbfgs')
model.fit(X, y_encoded)

# Test data
X_test = np.array([
    [140, 6],  # Expected: Apple
    [12, 8],   # Expected: Cherry
    [125, 5]   # Expected: Banana
])

# Predict probabilities for each class
probs = model.predict_proba(X_test)
predictions = model.predict(X_test)

# Convert predicted labels back to original class names
predicted_classes = le.inverse_transform(predictions)

# Print results
for i, x in enumerate(X_test):
    print(f"Test Data: {x}")
    print(f"Predicted Probabilities: {probs[i]}")
    print(f"Predicted Class: {predicted_classes[i]}\n")

Test Data: [140   6]
Predicted Probabilities: [5.19151260e-01 4.80777962e-01 7.07778904e-05]
Predicted Class: Apple

Test Data: [12  8]
Predicted Probabilities: [3.41751899e-22 6.85793166e-02 9.31420683e-01]
Predicted Class: Cherry

Test Data: [125   5]
Predicted Probabilities: [3.84060828e-03 9.95473944e-01 6.85447453e-04]
Predicted Class: Banana