Hyper-paramter Tuning Intiution

Hyper-paramter Tuning Intiution #

Perfect 👍 let’s go deeper into the mathematical intuition behind the key hyperparameters of SVC.

We’ll focus on the three most important ones: C, γ (gamma), and kernel.

Objective Function of SVC #

The primal optimization problem of SVM is:

\[ \min_{w, b, \xi} \quad \frac{1}{2} \|w\|^2 + C \sum_{i=1}^N \xi_i \]

subject to:

\[ y_i (w^T \phi(x_i) + b) \geq 1 - \xi_i, \quad \xi_i \geq 0 \]

Where:

\(w\): weight vector
\(b\): bias term
\(\xi_i\): slack variables (allow misclassifications)
\(C\): regularization parameter (controls penalty for misclassifications)
\(\phi(x)\): feature mapping (depends on kernel)

Role of C (Regularization)#

From the objective function:

The term \(\frac{1}{2} \|w\|^2\) → tries to maximize margin.
The term \(C \sum \xi_i\) → penalizes misclassifications.

👉 Intuition:

Small C → margin maximization dominates (tolerates some errors).
- Simpler decision boundary.
- Prevents overfitting.
Large C → error penalty dominates (forces correct classification of training data).
- Narrow margin.
- Risk of overfitting.

Role of γ (Gamma) in RBF Kernel #

The RBF kernel is:

\[ K(x_i, x_j) = \exp(-\gamma \|x_i - x_j\|^2) \]

👉 Interpretation:

If \(\|x_i - x_j\|\) is small → similarity close to 1.
If \(\|x_i - x_j\|\) is large → similarity close to 0.
\(\gamma\) controls the decay rate of similarity.
Small γ:
- Kernel is smoother, points far apart are still considered similar.
- Leads to a smooth, less complex decision boundary.
Large γ:
- Kernel is sharper, only very close neighbors are considered similar.
- Leads to highly complex decision boundary (can overfit).

Role of Kernel Choice#

The kernel defines \(\phi(x)\), the transformation of data:

Linear kernel:

\[ K(x_i, x_j) = x_i^T x_j \]

→ Works well if data is linearly separable.
Polynomial kernel:

\[ K(x_i, x_j) = (x_i^T x_j + c)^d \]

→ Captures polynomial relationships; degree \(d\) is a hyperparameter.
RBF kernel:

\[ K(x_i, x_j) = \exp(-\gamma \|x_i - x_j\|^2) \]

→ Very flexible, maps data to infinite-dimensional feature space.

How C and γ Interact #

High C + High γ:
- Very complex model, tries to classify everything correctly.
- Risk of overfitting.
Low C + Low γ:
- Very smooth decision boundary, high bias.
- Risk of underfitting.
Balanced values:
- Trade-off between margin size, misclassification, and flexibility.

Decision Function #

The final decision function of SVC is:

\[ f(x) = \text{sign}\Big(\sum_{i=1}^N \alpha_i y_i K(x_i, x) + b\Big) \]

\(\alpha_i\): learned weights (nonzero only for support vectors).
\(K(x_i, x)\): similarity function (depends on γ and kernel).
\(C\): controls how many support vectors exist (larger C → more support vectors).

Summary (Mathematical Intuition):

C → controls penalty on misclassified points (\(\xi_i\)).
γ → controls how similarity decays in RBF kernel.
Kernel → defines feature space transformation.
Together, they shape the decision boundary: wide vs narrow, smooth vs complex.

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Load dataset
X, y = datasets.load_iris(return_X_y=True)

# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Base SVC model
svc = SVC()

# Hyperparameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['linear', 'rbf', 'poly']
}

# GridSearchCV
grid = GridSearchCV(
    estimator=svc,
    param_grid=param_grid,
    refit=True,        # keep best model
    cv=5,              # 5-fold cross-validation
    verbose=2,
    n_jobs=-1          # use all CPUs
)

# Fit
grid.fit(X_train, y_train)



# Best hyperparameters
print("Best Parameters:", grid.best_params_)

# Use best model for predictions
y_pred = grid.predict(X_test)

# Evaluation
print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))

Fitting 5 folds for each of 48 candidates, totalling 240 fits
Best Parameters: {'C': 100, 'gamma': 0.01, 'kernel': 'rbf'}

Classification Report:

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

Hyper-paramter Tuning Intiution

Contents

Hyper-paramter Tuning Intiution#

Objective Function of SVC#