Cost Function#
1. Hard-margin SVM (ideal separable case)#
Goal: maximize the margin
Equivalent to minimizing
Constraints:
\[ y_i (w^T x_i + b) \geq 1, \quad \forall i \]This means each correctly classified point lies outside or on the margin.
2. Soft-margin SVM (real-world case, with overlap)#
Real data are noisy and not perfectly separable. So we add slack variables \(\xi_i \geq 0\):
If \(\xi_i = 0\): correctly classified, outside the margin.
If \(0 < \xi_i < 1\): correctly classified, but inside the margin.
If \(\xi_i > 1\): misclassified.
3. Cost function for soft margin#
First term: keeps the margin large.
Second term: penalizes violations (misclassified or margin-crossing points).
\(C\): hyperparameter that controls tradeoff:
Large \(C\): less tolerance for violations → narrower margin.
Small \(C\): more tolerance → wider margin, better generalization.
4. Hinge loss interpretation#
The penalty for each point is given by hinge loss:
If correctly classified with margin ≥ 1 → loss = 0.
If close to boundary or misclassified → positive loss.
So the full objective becomes:
5. Summary#
Hard margin → perfect separation, no errors.
Soft margin → allows errors with penalty, controlled by \(C\).
Slack variables \(\xi_i\) measure violations.
Hinge loss is the error function for SVC.