One-vs-Rest (OVR) Logistic Regression#


1. The Problem#

  • Standard logistic regression handles binary classification: two classes only.

  • Many real-world problems are multi-class (3 or more classes), e.g., classifying animals as Cat, Dog, or Rabbit.

  • We need a way to extend logistic regression to handle multiple classes.


2. One-vs-Rest (OVR) Strategy#

OVR (also called One-vs-All) converts a multi-class problem into multiple binary classification problems:

  1. Suppose there are \(K\) classes: \(C_1, C_2, ..., C_K\).

  2. For each class \(C_k\), train a binary logistic regression classifier:

    • Treat \(C_k\) as the positive class (1).

    • Treat all other classes as negative class (0).

Example with 3 classes (Cat, Dog, Rabbit):

Classifier

Positive

Negative

M1

Cat

Dog, Rabbit

M2

Dog

Cat, Rabbit

M3

Rabbit

Cat, Dog


3. Training Phase#

  • Each binary model \(M_k\) is trained independently.

  • Input features remain the same for all models.

  • Use one-hot encoding for outputs:

Class

One-hot

Cat

[1,0,0]

Dog

[0,1,0]

Rabbit

[0,0,1]

  • Each model predicts the probability that a sample belongs to its respective class.


4. Prediction Phase#

For a new data point:

  1. Pass the input to all K models.

  2. Each model outputs a probability that the point belongs to its positive class.

  3. Example probabilities:

Model

Probability

M1

0.25

M2

0.20

M3

0.55

  1. Choose the class with the highest probability → here, Rabbit (class 3).


5. Advantages of OVR#

  • Simple to implement.

  • Works with any binary classifier (logistic regression, SVM, etc.).

  • Efficient when the number of classes is not very large.


6. Disadvantages#

  • Probabilities from different classifiers may not be well-calibrated.

  • Can be biased if one class is much smaller than the “rest.”

  • Less accurate than One-vs-One in some cases.


Summary#

OVR Logistic Regression works by:

  1. Splitting a multi-class problem into K binary problems.

  2. Training a separate logistic regression for each class.

  3. Predicting the class with the highest probability across all models.

Example Problem Statement#

Problem: You are building a model to classify types of fruits based on two features:

  • f1 = Weight (grams)

  • f2 = Color Score (0–10 scale)

Classes:

  1. Apple

  2. Banana

  3. Cherry

Training Data:

Fruit

f1 (Weight)

f2 (Color Score)

Apple

150

8

Apple

170

7

Banana

120

4

Banana

130

5

Cherry

10

9

Cherry

15

8

We want to predict the fruit type given f1 and f2.


Step 1: One-vs-Rest (OVR) Setup#

We have 3 classes, so we create 3 binary classifiers:

  1. M1 (Apple vs Rest):

    • Positive: Apple

    • Negative: Banana, Cherry

  2. M2 (Banana vs Rest):

    • Positive: Banana

    • Negative: Apple, Cherry

  3. M3 (Cherry vs Rest):

    • Positive: Cherry

    • Negative: Apple, Banana


Step 2: One-hot Encoding of Target#

Fruit

One-hot (Apple, Banana, Cherry)

Apple

[1, 0, 0]

Banana

[0, 1, 0]

Cherry

[0, 0, 1]

  • Each classifier uses its corresponding column as the target.


Step 3: Training Binary Models#

  • Each binary logistic regression model is trained independently:

    • Input: [f1, f2]

    • Output: probability of being the positive class


Step 4: Prediction Example#

Test data:

  • f1 = 140, f2 = 6

Step 4a: Predict probabilities using each classifier

Model

Class

Probability

M1

Apple

0.4

M2

Banana

0.35

M3

Cherry

0.25

Step 4b: Choose class with highest probability

  • Max probability = 0.4 → Apple

So the predicted class is Apple.


Step 5: Summary#

  • OVR breaks multi-class classification into multiple binary logistic regressions.

  • Each model outputs a probability for its class.

  • Final prediction = class with highest probability.

# Import libraries
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
import warnings

warnings.filterwarnings("ignore")
# Sample data (Fruit dataset)
X = np.array([
    [150, 8],   # Apple
    [170, 7],   # Apple
    [120, 4],   # Banana
    [130, 5],   # Banana
    [10, 9],    # Cherry
    [15, 8]     # Cherry
])

y = np.array(['Apple', 'Apple', 'Banana', 'Banana', 'Cherry', 'Cherry'])

# Encode labels to integers
le = LabelEncoder()
y_encoded = le.fit_transform(y)  # Apple=0, Banana=1, Cherry=2

# Create OVR Logistic Regression model
model = LogisticRegression(multi_class='ovr', solver='lbfgs')
model.fit(X, y_encoded)

# Test data
X_test = np.array([
    [140, 6],  # Expected: Apple
    [12, 8],   # Expected: Cherry
    [125, 5]   # Expected: Banana
])

# Predict probabilities for each class
probs = model.predict_proba(X_test)
predictions = model.predict(X_test)

# Convert predicted labels back to original class names
predicted_classes = le.inverse_transform(predictions)

# Print results
for i, x in enumerate(X_test):
    print(f"Test Data: {x}")
    print(f"Predicted Probabilities: {probs[i]}")
    print(f"Predicted Class: {predicted_classes[i]}\n")
Test Data: [140   6]
Predicted Probabilities: [5.19151260e-01 4.80777962e-01 7.07778904e-05]
Predicted Class: Apple

Test Data: [12  8]
Predicted Probabilities: [3.41751899e-22 6.85793166e-02 9.31420683e-01]
Predicted Class: Cherry

Test Data: [125   5]
Predicted Probabilities: [3.84060828e-03 9.95473944e-01 6.85447453e-04]
Predicted Class: Banana