3.10 Elastic net regression
The elastic net is a linear regressor that blends ridge (L2) and lasso (L1) regularization. Given a design matrix X (one row per sample) and targets y, it fits weights w minimizing
‖X w − y‖₂² + λα‖w‖₂² + λ(1−α)‖w‖₁
The regularization strength λ ≥ 0 controls the overall amount of shrinkage, and the mixing parameter α ∈ [0, 1] interpolates between the two penalties: α = 1 is pure ridge, α = 0 is pure lasso (Lasso along a regularization path), and intermediate values combine them. This example exposes that as a reusable procedure, (make-elastic-net X y #:lambda λ #:alpha α), returning the fitted w.
(require racket/list scs)
(provide make-elastic-net run-example)
3.10.1 From the objective to a quadratic program
Expanding the squared residual, ‖X w − y‖₂² = wᵀ(XᵀX)w − 2(Xᵀy)ᵀw + yᵀy, and folding in the ridge term λα‖w‖₂² = λα wᵀw, the smooth part of the objective is wᵀ(XᵀX + λα I)w − 2(Xᵀy)ᵀw (the constant yᵀy is irrelevant to the minimizer).
The L1 term uses the absolute-value trick: introduce t with |w_i| ≤ t_i, written as the two inequalities w_i − t_i ≤ 0 and −w_i − t_i ≤ 0, and add λ(1−α)·Σ t_i to the objective. Over the stacked variable (w, t) this is a quadratic cone program in SCS’s standard form ½ vᵀP v + cᵀv:
P has 2(XᵀX + λα I) on the w block and zeros on the t block (the factor 2 absorbs SCS’s ½).
c = (−2 Xᵀy, λ(1−α)·1).
the 2n constraint rows are all positive-orthant.
3.10.2 Small matrix helpers
We take X as a list of rows and y as a list, and compute the needed Gram entries directly. col-dot is the (i, j) entry of XᵀX, and col-y-dot the ith entry of Xᵀy.
3.10.3 Assembling and solving
(define (make-elastic-net X y #:lambda lam #:alpha alpha) (define n (length (car X))) ; number of features (define n2 (* 2 n)) ; variables (w, t) (define P-triples (for*/list ([i (in-range n)] [j (in-range n)] #:when (<= i j) #:when (let ([v (+ (* 2.0 (col-dot X i j)) (if (= i j) (* 2.0 lam alpha) 0.0))]) (not (zero? v)))) (list i j (+ (* 2.0 (col-dot X i j)) (if (= i j) (* 2.0 lam alpha) 0.0))))) (define P (apply scs:sparse-matrix n2 n2 P-triples)) (define rows (append* (for/list ([i (in-range n)]) (list (sparse-row n2 (list (cons i 1) (cons (+ n i) -1))) (sparse-row n2 (list (cons i -1) (cons (+ n i) -1))))))) (define A (apply scs:matrix n2 n2 (append* rows))) (define c (list->vector (append (for/list ([i (in-range n)]) (* -2.0 (col-y-dot X y i))) (make-list n (* lam (- 1.0 alpha)))))) (define result (solve #:A A #:b (make-list n2 0.0) #:c c #:P P #:cone (make-cone #:positive n2) #:settings (make-settings #:eps-abs 1e-9 #:eps-rel 1e-9))) (for/vector ([i (in-range n)]) (vector-ref (scs-result-x result) i)))
Running it.
On a small dataset, α = 1 recovers the closed-form ridge solution while mixing in some L1 (α < 1) shrinks the weights further:
(define X '((1.0 0.0) (0.0 1.0) (1.0 1.0))) (define y '(1.0 2.0 0.5)) (define (run-example) (list (make-elastic-net X y #:lambda 0.1 #:alpha 1.0) ; ridge (make-elastic-net X y #:lambda 0.2 #:alpha 0.5))) ; elastic
(car (run-example)) ; #(0.1906 1.0997), the ridge optimum