On this page:
<require>
<provide>
3.10.1 From the objective to a quadratic program
3.10.2 Small matrix helpers
<helpers>
3.10.3 Assembling and solving
<make-elastic-net>
<run-example>
<*>

3.10 Elastic net regression🔗ℹ

The elastic net is a linear regressor that blends ridge (L2) and lasso (L1) regularization. Given a design matrix X (one row per sample) and targets y, it fits weights w minimizing

‖X w − y‖₂² + λα‖w‖₂² + λ(1−α)‖w‖₁

The regularization strength λ ≥ 0 controls the overall amount of shrinkage, and the mixing parameter α ∈ [0, 1] interpolates between the two penalties: α = 1 is pure ridge, α = 0 is pure lasso (Lasso along a regularization path), and intermediate values combine them. This example exposes that as a reusable procedure, (make-elastic-net X y #:lambda λ #:alpha α), returning the fitted w.

(require racket/list
         scs)

(provide make-elastic-net run-example)

3.10.1 From the objective to a quadratic program🔗ℹ

Expanding the squared residual, ‖X w − y‖₂² = wᵀ(XᵀX)w − 2(Xᵀy)ᵀw + yᵀy, and folding in the ridge term λα‖w‖₂² = λα wᵀw, the smooth part of the objective is wᵀ(XᵀX + λα I)w − 2(Xᵀy)ᵀw (the constant yᵀy is irrelevant to the minimizer).

The L1 term uses the absolute-value trick: introduce t with |w_i| ≤ t_i, written as the two inequalities w_i − t_i ≤ 0 and −w_i − t_i ≤ 0, and add λ(1−α)·Σ t_i to the objective. Over the stacked variable (w, t) this is a quadratic cone program in SCS’s standard form ½ vᵀP v + cᵀv:

  • P has 2(XᵀX + λα I) on the w block and zeros on the t block (the factor 2 absorbs SCS’s ½).

  • c = (−2 Xᵀy, λ(1−α)·1).

  • the 2n constraint rows are all positive-orthant.

3.10.2 Small matrix helpers🔗ℹ

We take X as a list of rows and y as a list, and compute the needed Gram entries directly. col-dot is the (i, j) entry of XᵀX, and col-y-dot the ith entry of Xᵀy.

(define (col-dot X i j)
  (for/sum ([row (in-list X)]) (* (list-ref row i) (list-ref row j))))
 
(define (col-y-dot X y i)
  (for/sum ([row (in-list X)] [yi (in-list y)]) (* (list-ref row i) yi)))
 
 
(define (sparse-row n2 entries)
  (for/list ([k (in-range n2)])
    (cond [(assoc k entries) => cdr] [else 0])))

3.10.3 Assembling and solving🔗ℹ

(define (make-elastic-net X y #:lambda lam #:alpha alpha)
  (define n (length (car X)))      ; number of features
  (define n2 (* 2 n))              ; variables (w, t)
 
  (define P-triples
    (for*/list ([i (in-range n)] [j (in-range n)]
                #:when (<= i j)
                #:when (let ([v (+ (* 2.0 (col-dot X i j))
                                   (if (= i j) (* 2.0 lam alpha) 0.0))])
                         (not (zero? v))))
      (list i j (+ (* 2.0 (col-dot X i j))
                   (if (= i j) (* 2.0 lam alpha) 0.0)))))
  (define P (apply scs:sparse-matrix n2 n2 P-triples))
 
  (define rows
    (append*
     (for/list ([i (in-range n)])
       (list (sparse-row n2 (list (cons i 1) (cons (+ n i) -1)))
             (sparse-row n2 (list (cons i -1) (cons (+ n i) -1)))))))
  (define A (apply scs:matrix n2 n2 (append* rows)))
  (define c
    (list->vector
     (append (for/list ([i (in-range n)]) (* -2.0 (col-y-dot X y i)))
             (make-list n (* lam (- 1.0 alpha))))))
  (define result
    (solve #:A A
           #:b (make-list n2 0.0)
           #:c c
           #:P P
           #:cone (make-cone #:positive n2)
           #:settings (make-settings #:eps-abs 1e-9 #:eps-rel 1e-9)))
  (for/vector ([i (in-range n)]) (vector-ref (scs-result-x result) i)))

Running it.

On a small dataset, α = 1 recovers the closed-form ridge solution while mixing in some L1 (α < 1) shrinks the weights further:

(define X '((1.0 0.0) (0.0 1.0) (1.0 1.0)))
(define y '(1.0 2.0 0.5))
 
(define (run-example)
  (list (make-elastic-net X y #:lambda 0.1 #:alpha 1.0)    ; ridge
        (make-elastic-net X y #:lambda 0.2 #:alpha 0.5)))  ; elastic

(car (run-example))   ; #(0.1906 1.0997), the ridge optimum

<*> ::=