Overview

Overview #

Before diving into SVM, it is important to have a good grasp of Logistic Regression. If you haven’t already studied Logistic Regression, I recommend going through its concepts, including the mathematical intuition behind it, before proceeding further.

Support Vector Machine #

SVM builds on this concept but adds an important enhancement.

Support Vector Classifier (SVC):

Geometric Intuition
Let’s consider a 2D example where we have two categories of points.
- SVM creates a best-fit line to separate the categories.
- In addition to this line, SVM also creates two additional marginal planes on either side of the line.
Maximizing the Margin
- The distance between these two marginal planes is called the margin.
- SVM ensures that this margin is maximized, which makes the classifier more robust.

For example:

If we compare two possible lines, the one with the larger margin is chosen because it is less likely to misclassify data points.

Support Vectors
- The data points closest to the marginal planes are called support vectors.
- These points play a crucial role in defining the position of the best-fit line and the marginal planes.

SVM in 3D and Higher Dimensions #

In a 3D space, SVM creates:

A plane as the decision boundary.
Two marginal planes on either side of the boundary, ensuring the margin is maximized.

Similarly, for n-dimensional data, the decision boundary becomes a hyperplane, and marginal planes are adjusted accordingly.

Key Takeaways #

SVM focuses on finding a decision boundary (line, plane, or hyperplane) with the maximum margin between categories.
The points that influence this decision boundary are called support vectors.
SVM can also handle multi-class classification problems.

Support Vector Machine (SVM) - Soft Margin vs Hard Margin #

In this session, we have understood the fundamental concept of Support Vector Machine (SVM) for solving classification problems using a Support Vector Classifier (SVC). To recap, the main goal of SVM is to find the best-fit line (or hyperplane in higher dimensions) that separates data points belonging to different classes, while maximizing the margin between the marginal planes.

Now, let’s dive into an important aspect of SVM: Soft Margin and Hard Margin.

Hard Margin

Definition: A hard margin assumes that the data is perfectly separable, meaning that all data points are classified without any errors.
Characteristics:
- The classes are clearly separated.
- There is no overlap between data points of different classes.
- The margin is maximized without allowing any misclassifications.
Limitations:
- In real-world scenarios, data is rarely perfectly separable.
- Noise, outliers, and overlapping points make it impractical to achieve a hard margin in most cases.

Soft Margin

Definition: A soft margin allows for some misclassifications or errors, recognizing that data in real-world problems often overlaps or is noisy.
Characteristics:
- It introduces a slack variable to account for points that fall on the wrong side of the margin or are misclassified.
- The aim is to balance maximizing the margin and minimizing classification errors.
- Soft margin optimization adjusts the trade-off between the width of the margin and classification accuracy.
Advantages:
- It is more flexible and works well with real-world, noisy, and overlapping data.
- Provides a way to handle outliers without compromising the entire model.

Illustrative Example #

Let’s consider a 2D plane with two classes of points:

In a hard margin scenario, the points are cleanly separable, and we can draw a hyperplane (best-fit line) with clear marginal planes on either side.
In a soft margin scenario, overlapping data points make it impossible to draw a hyperplane that separates all points perfectly. Here, SVM tolerates some misclassification to achieve the optimal margin.

Real-World Implications #

Hard Margin: Works well when:
- Data is clean and well-separated.
- No or minimal overlap exists between classes.
Soft Margin: Is preferred when:
- Data is noisy, with overlapping points.
- Outliers or errors are expected in the dataset.

1. Equation of the Decision Boundary #

The best-fit line (decision boundary) is represented as: \( w^\top x + b = 0 \)
- W: A vector perpendicular to the decision boundary (normal vector).
- b: Bias term.
If the line passes through the origin, \(b = 0\), simplifying the equation to \(w^\top x = 0\).

2. Distance of Points from the Decision Boundary #

Points are categorized based on their position relative to the decision boundary:
- Below the line: Negative distance, \(w^\top x + b < 0\).
- Above the line: Positive distance, \(w^\top x + b > 0\).
- On the line: Zero distance, \(w^\top x + b = 0\).
Key Insight:
- The sign of the distance indicates classification.
- The magnitude of the distance determines how far a point is from the decision boundary.

3. Marginal Planes and Support Vectors #

Two marginal planes are defined parallel to the decision boundary: \( w^\top x + b = +1 \quad \text{and} \quad w^\top x + b = -1 \)
- Points lying on these planes are the support vectors.
- The distance between these planes is: \( \text{Margin} = \frac{2}{\|w\|} \)
- Goal: Maximize this margin to achieve better generalization.

4. Cost Function and Constraints #

Objective: Maximize the margin, or equivalently minimize: \( \frac{1}{2} \|w\|^2 \)
- The factor \(\frac{1}{2}\) simplifies derivatives during optimization.
Constraints: For all correctly classified points: \( y_i \left( w^\top x_i + b \right) \geq 1 \)
- \(y_i = +1\): Positive points must satisfy \(w^\top x_i + b \geq 1\).
- \(y_i = -1\): Negative points must satisfy \(w^\top x_i + b \leq -1\).

5. Simplified Optimization Problem #

The SVM optimization problem can be summarized as: \( \min_{w, b} \frac{1}{2} \|w\|^2 \) Subject to: \( y_i \left( w^\top x_i + b \right) \geq 1 \quad \forall i \)

import plotly.graph_objects as go
import numpy as np
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from plotly.subplots import make_subplots

# Generate synthetic 2D dataset
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                           n_clusters_per_class=1, n_samples=100, random_state=42)

# Train SVM with soft margin (low C) and hard margin (very high C)
clf_soft = SVC(kernel='linear', C=0.1)
clf_soft.fit(X, y)

clf_hard = SVC(kernel='linear', C=1e6)
clf_hard.fit(X, y)

# Create grid for decision boundary visualization
xx, yy = np.meshgrid(np.linspace(X[:, 0].min()-1, X[:, 0].max()+1, 200),
                     np.linspace(X[:, 1].min()-1, X[:, 1].max()+1, 200))

# Decision function values
Z_soft = clf_soft.decision_function(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
Z_hard = clf_hard.decision_function(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

# Create Plotly figure with subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=("Soft Margin (C=0.1)", "Hard Margin (C=1e6)"))

# --- SOFT MARGIN ---
fig.add_trace(go.Scatter(x=X[y==0, 0], y=X[y==0, 1], mode='markers',
                         marker=dict(color='red', size=8), name='Class 0'), row=1, col=1)
fig.add_trace(go.Scatter(x=X[y==1, 0], y=X[y==1, 1], mode='markers',
                         marker=dict(color='blue', size=8), name='Class 1'), row=1, col=1)

# Decision boundary and margins for soft margin
fig.add_trace(go.Contour(x=xx[0], y=yy[:, 0], z=Z_soft,
                         contours=dict(start=0, end=0, size=1, coloring='lines'),
                         line=dict(color='black', width=2), showscale=False, name='Decision Boundary'),
              row=1, col=1)
fig.add_trace(go.Contour(x=xx[0], y=yy[:, 0], z=Z_soft,
                         contours=dict(start=-1, end=1, size=2, coloring='lines'),
                         line=dict(color='gray', dash='dash', width=2), showscale=False, name='Margins'),
              row=1, col=1)
fig.add_trace(go.Scatter(x=clf_soft.support_vectors_[:, 0], y=clf_soft.support_vectors_[:, 1],
                         mode='markers', marker=dict(color='yellow', size=12, symbol='x'),
                         name='Support Vectors'), row=1, col=1)

# --- HARD MARGIN ---
fig.add_trace(go.Scatter(x=X[y==0, 0], y=X[y==0, 1], mode='markers',
                         marker=dict(color='red', size=8), showlegend=False), row=1, col=2)
fig.add_trace(go.Scatter(x=X[y==1, 0], y=X[y==1, 1], mode='markers',
                         marker=dict(color='blue', size=8), showlegend=False), row=1, col=2)

# Decision boundary and margins for hard margin
fig.add_trace(go.Contour(x=xx[0], y=yy[:, 0], z=Z_hard,
                         contours=dict(start=0, end=0, size=1, coloring='lines'),
                         line=dict(color='black', width=2), showscale=False),
              row=1, col=2)
fig.add_trace(go.Contour(x=xx[0], y=yy[:, 0], z=Z_hard,
                         contours=dict(start=-1, end=1, size=2, coloring='lines'),
                         line=dict(color='gray', dash='dash', width=2), showscale=False),
              row=1, col=2)
fig.add_trace(go.Scatter(x=clf_hard.support_vectors_[:, 0], y=clf_hard.support_vectors_[:, 1],
                         mode='markers', marker=dict(color='yellow', size=12, symbol='x'),
                         showlegend=False), row=1, col=2)

# Layout
fig.update_layout(title_text="SVM: Soft Margin vs Hard Margin", height=600, width=1000)
fig.show()

Overview

Contents