Cost Function#

1. Hard-margin SVM (ideal separable case)#

  • Goal: maximize the margin

\[ \max_{w,b} \frac{2}{\|w\|} \]
  • Equivalent to minimizing

\[ \min_{w,b} \frac{1}{2}\|w\|^2 \]
  • Constraints:

    \[ y_i (w^T x_i + b) \geq 1, \quad \forall i \]

    This means each correctly classified point lies outside or on the margin.


2. Soft-margin SVM (real-world case, with overlap)#

Real data are noisy and not perfectly separable. So we add slack variables \(\xi_i \geq 0\):

\[ y_i (w^T x_i + b) \geq 1 - \xi_i \]
  • If \(\xi_i = 0\): correctly classified, outside the margin.

  • If \(0 < \xi_i < 1\): correctly classified, but inside the margin.

  • If \(\xi_i > 1\): misclassified.


3. Cost function for soft margin#

\[ \min_{w,b,\xi} \frac{1}{2}\|w\|^2 + C \sum_{i=1}^n \xi_i \]
  • First term: keeps the margin large.

  • Second term: penalizes violations (misclassified or margin-crossing points).

  • \(C\): hyperparameter that controls tradeoff:

    • Large \(C\): less tolerance for violations → narrower margin.

    • Small \(C\): more tolerance → wider margin, better generalization.


4. Hinge loss interpretation#

The penalty for each point is given by hinge loss:

\[ L_i = \max(0, 1 - y_i (w^T x_i + b)) \]
  • If correctly classified with margin ≥ 1 → loss = 0.

  • If close to boundary or misclassified → positive loss.

So the full objective becomes:

\[ \min_{w,b} \frac{1}{2}\|w\|^2 + C \sum_{i=1}^n \max(0, 1 - y_i (w^T x_i + b)) \]

5. Summary#

  • Hard margin → perfect separation, no errors.

  • Soft margin → allows errors with penalty, controlled by \(C\).

  • Slack variables \(\xi_i\) measure violations.

  • Hinge loss is the error function for SVC.