BCP-47 compliant language tag predicates
This module provides a single predicate that determines whether a given string is a valid Language Tag as defined by RFC5646 and used across HTTP, HTML, XML, RDF, and much more.
References
BCP-47, RFC5646 Tags for Identifying Languages
IANA Registry of Language Tags (Assigned)
IANA Registry of Language Subtags
IANA Registry of Language Tag Extensions (UCD)
predicate
(language-tag? val) → boolean?
val : (or/c symbol? string?) 
> (require langtag) > (language-tag? "en") #t
> (language-tag? "en-US") #t
> (language-tag? "en-US-boont") #t
> (language-tag? "en-Latn-US") #t
> (language-tag? "i-klingon") #t
> (language-tag? "x-private") #t
1 Components
predicate
(normal-use? val) → boolean?
val : (or/c symbol? string?) 
predicate
(private-use? val) → boolean?
val : (or/c symbol? string?) 
predicate
(grandfathered? val) → boolean?
val : (or/c symbol? string?) 
> (require langtag) 
> (for-each (lambda (val) (displayln (format "~s ~s ~s" (normal-use? val) (private-use? val) (grandfathered? val)))) '("en-US" "x-private" "i-klingon")) 
#t #f #f
#f #t #f
#f #f #t
predicate
(language-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-script-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-region-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-variant-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-extension-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-private-use-part? val) → boolean?
val : (or/c symbol? string?) 
2 Matching
procedure
(language-tag-match val)
→ (list symbol? string? (or/c (listof (cons/c symbol? string?)) none/c)) val : (or/c symbol? string?) 
> (require langtag) > (language-tag-match "en") '(lang "en" ((language . "en")))
> (language-tag-match "en-US") '(lang "en-US" ((language . "en") (region . "US")))
> (language-tag-match "en-US-boont") '(lang "en-US-boont" ((language . "en") (region . "US") (variant . "boont")))
> (language-tag-match "en-Latn-US") '(lang "en-Latn-US" ((language . "en") (script . "Latn") (region . "US")))
> (language-tag-match "i-klingon") '(grandfathered-i "i-klingon")
> (language-tag-match "x-private") '(private-use "x-private")
3 Appendix: Definition
The syntax of the language tag, from [RFC5646], in ABNF [RFC5234] is:
Language-Tag = langtag ; normal language tags  | 
/ privateuse ; private use tag  | 
/ grandfathered ; grandfathered tags  | 
  | 
langtag = language  | 
["-" script]  | 
["-" region]  | 
*("-" variant)  | 
*("-" extension)  | 
["-" privateuse]  | 
  | 
language = 2*3ALPHA ; shortest ISO 639 code  | 
["-" extlang] ; sometimes followed by  | 
; extended language subtags  | 
/ 4ALPHA ; or reserved for future use  | 
/ 5*8ALPHA ; or registered language subtag  | 
  | 
extlang = 3ALPHA ; selected ISO 639 codes  | 
*2("-" 3ALPHA) ; permanently reserved  | 
  | 
script = 4ALPHA ; ISO 15924 code  | 
  | 
region = 2ALPHA ; ISO 3166-1 code  | 
/ 3DIGIT ; UN M.49 code  | 
  | 
variant = 5*8alphanum ; registered variants  | 
/ (DIGIT 3alphanum)  | 
  | 
extension = singleton 1*("-" (2*8alphanum))  | 
  | 
; Single alphanumerics  | 
; "x" reserved for private use  | 
singleton = DIGIT ; 0 - 9  | 
/ %x41-57 ; A - W  | 
/ %x59-5A ; Y - Z  | 
/ %x61-77 ; a - w  | 
/ %x79-7A ; y - z  | 
  | 
privateuse = "x" 1*("-" (1*8alphanum))  | 
  | 
grandfathered = irregular ; non-redundant tags registered  | 
/ regular ; during the RFC 3066 era  | 
  | 
irregular = "en-GB-oed" ; irregular tags do not match  | 
/ "i-ami" ; the 'langtag' production and  | 
/ "i-bnn" ; would not otherwise be  | 
/ "i-default" ; considered 'well-formed'  | 
/ "i-enochian" ; These tags are all valid,  | 
/ "i-hak" ; but most are deprecated  | 
/ "i-klingon" ; in favor of more modern  | 
/ "i-lux" ; subtags or subtag  | 
/ "i-mingo" ; combination  | 
/ "i-navajo"  | 
/ "i-pwn"  | 
/ "i-tao"  | 
/ "i-tay"  | 
/ "i-tsu"  | 
/ "sgn-BE-FR"  | 
/ "sgn-BE-NL"  | 
/ "sgn-CH-DE"  | 
  | 
regular = "art-lojban" ; these tags match the 'langtag'  | 
/ "cel-gaulish" ; production, but their subtags  | 
/ "no-bok" ; are not extended language  | 
/ "no-nyn" ; or variant subtags: their meaning  | 
/ "zh-guoyu" ; is defined by their registration  | 
/ "zh-hakka" ; and all of these are deprecated  | 
/ "zh-min" ; in favor of a more modern  | 
/ "zh-min-nan" ; subtag or sequence of subtags  | 
/ "zh-xiang"  | 
  | 
alphanum = (ALPHA / DIGIT) ; letters and numbers  |