3.12 Support vector machine

Racket

3.12 Support vector machine🔗ℹ

A soft-margin support vector machine (SVM) is the classic classification QP. Given labelled points (x_i, y_i) with y_i ∈ {−1, +1}, it fits a separating hyperplane wᵀx + b by solving

minimize ½‖w‖₂² + C·Σ ξ_i
subject to y_i (wᵀx_i + b) ≥ 1 − ξ_i, ξ_i ≥ 0

The slacks ξ_i (the hinge losses) let points fall inside or past the margin at a cost C. This example exposes (train-svm data #:C C), returning the fitted w and b.

<require> ::=

(require racket/list
scs)

<provide> ::=

(provide train-svm run-example)

3.12.1 Variables and standard form🔗ℹ

Stack the unknowns as v = (w, b, ξ) with d weights, one intercept, and m slacks. The objective is quadratic only in w, so P is the identity on the w block (giving ½‖w‖₂²) and zero elsewhere, and c places C on each ξ. Both constraint families become positive-orthant rows once moved to Ax ≤ b form: the margin constraint y_i(wᵀx_i + b) + ξ_i ≥ 1 negates to

−y_i x_iᵀ w − y_i b − ξ_i ≤ −1

and ξ_i ≥ 0 is −ξ_i ≤ 0.

<helpers> ::=

(define (sparse-row n entries)
(for/list ([k (in-range n)])
(cond [(assoc k entries) => cdr] [else 0])))

<train-svm> ::=

(define (train-svm data #:C [C 1.0])
  (define m (length data))
  (define d (length (caar data)))     ; feature dimension
  (define n (+ d 1 m))                ; (w, b, slacks)
  (define b-col d)                    ; column of the intercept b
  (define (slack-col i) (+ d 1 i))    ; column of slack i

  (define P
    (apply scs:sparse-matrix n n
           (for/list ([j (in-range d)]) (list j j 1.0))))

  (define rows
    (append*
     (for/list ([pt (in-list data)] [i (in-naturals)])
       (define xs (car pt))
       (define y (cadr pt))
       (define margin
         (sparse-row n (append (for/list ([f (in-range d)])
                                 (cons f (* (- y) (list-ref xs f))))
                               (list (cons b-col (- y))
                                     (cons (slack-col i) -1)))))
       (define nonneg (sparse-row n (list (cons (slack-col i) -1))))
       (list margin nonneg))))
  (define A (apply scs:matrix (* 2 m) n (append* rows)))
  (define rhs
    (append* (for/list ([_ (in-range m)]) (list -1.0 0.0))))
  (define c
    (list->vector (append (make-list (+ d 1) 0.0) (make-list m C))))
  (define result
    (solve #:A A
           #:b rhs
           #:c c
           #:P P
           #:cone (make-cone #:positive (* 2 m))
           #:settings (make-settings #:eps-abs 1e-9 #:eps-rel 1e-9)))
  (define x (scs-result-x result))
  (values (for/vector ([j (in-range d)]) (vector-ref x j))
          (vector-ref x b-col)))

Running it.

On a small symmetric, linearly separable set the max-margin hyperplane is vertical: w = (0.5, 0), b = 0.

<run-example> ::=

(define dataset
  '(((2.0 1.0)  1) ((2.0 -1.0)  1)
    ((-2.0 1.0) -1) ((-2.0 -1.0) -1)))

(define (run-example)
  (define-values (w b) (train-svm dataset #:C 1.0))
  (list w b))

(run-example) ; (list #(0.5 0.0) 0.0)

<*> ::=

<require>
<provide>
<helpers>
<train-svm>
<run-example>

1	Getting started
2	Guide
3	Examples
4	Reference

3.1	Linear program
3.2	Quadratic program
3.3	Second-order cone
3.4	Exponential cone
3.5	Semidefinite program
3.6	Power cone
3.7	Indirect solver
3.8	Warm starting and re-solving
3.9	Lasso along a regularization path
3.10	Elastic net regression
3.11	Maximum entropy
3.12	Support vector machine
3.13	Portfolio optimization
3.14	Model predictive control