BCP-47 compliant language tag predicates
This module provides a single predicate that determines whether a given string is a valid Language Tag as defined by RFC5646 and used across HTTP, HTML, XML, RDF, and much more.
References
BCP-47, RFC5646 Tags for Identifying Languages
IANA Registry of Language Tags (Assigned)
IANA Registry of Language Subtags
IANA Registry of Language Tag Extensions (UCD)
predicate
(language-tag? val) → boolean?
val : (or/c symbol? string?)
> (require langtag) > (language-tag? "en") #t
> (language-tag? "en-US") #t
> (language-tag? "en-US-boont") #t
> (language-tag? "en-Latn-US") #t
> (language-tag? "i-klingon") #t
> (language-tag? "x-private") #t
1 Components
predicate
(normal-use? val) → boolean?
val : (or/c symbol? string?)
predicate
(private-use? val) → boolean?
val : (or/c symbol? string?)
predicate
(grandfathered? val) → boolean?
val : (or/c symbol? string?)
> (require langtag)
> (for-each (lambda (val) (displayln (format "~s ~s ~s" (normal-use? val) (private-use? val) (grandfathered? val)))) '("en-US" "x-private" "i-klingon"))
#t #f #f
#f #t #f
#f #f #t
predicate
(language-part? val) → boolean?
val : (or/c symbol? string?)
predicate
(language-script-part? val) → boolean?
val : (or/c symbol? string?)
predicate
(language-region-part? val) → boolean?
val : (or/c symbol? string?)
predicate
(language-variant-part? val) → boolean?
val : (or/c symbol? string?)
predicate
(language-extension-part? val) → boolean?
val : (or/c symbol? string?)
predicate
(language-private-use-part? val) → boolean?
val : (or/c symbol? string?)
2 Matching
procedure
(language-tag-match val)
→ (list symbol? string? (or/c (listof (cons/c symbol? string?)) none/c)) val : (or/c symbol? string?)
> (require langtag) > (language-tag-match "en") '(lang "en" ((language . "en")))
> (language-tag-match "en-US") '(lang "en-US" ((language . "en") (region . "US")))
> (language-tag-match "en-US-boont") '(lang "en-US-boont" ((language . "en") (region . "US") (variant . "boont")))
> (language-tag-match "en-Latn-US") '(lang "en-Latn-US" ((language . "en") (script . "Latn") (region . "US")))
> (language-tag-match "i-klingon") '(grandfathered-i "i-klingon")
> (language-tag-match "x-private") '(private-use "x-private")
3 Appendix: Definition
The syntax of the language tag, from [RFC5646], in ABNF [RFC5234] is:
Language-Tag = langtag ; normal language tags |
/ privateuse ; private use tag |
/ grandfathered ; grandfathered tags |
|
langtag = language |
["-" script] |
["-" region] |
*("-" variant) |
*("-" extension) |
["-" privateuse] |
|
language = 2*3ALPHA ; shortest ISO 639 code |
["-" extlang] ; sometimes followed by |
; extended language subtags |
/ 4ALPHA ; or reserved for future use |
/ 5*8ALPHA ; or registered language subtag |
|
extlang = 3ALPHA ; selected ISO 639 codes |
*2("-" 3ALPHA) ; permanently reserved |
|
script = 4ALPHA ; ISO 15924 code |
|
region = 2ALPHA ; ISO 3166-1 code |
/ 3DIGIT ; UN M.49 code |
|
variant = 5*8alphanum ; registered variants |
/ (DIGIT 3alphanum) |
|
extension = singleton 1*("-" (2*8alphanum)) |
|
; Single alphanumerics |
; "x" reserved for private use |
singleton = DIGIT ; 0 - 9 |
/ %x41-57 ; A - W |
/ %x59-5A ; Y - Z |
/ %x61-77 ; a - w |
/ %x79-7A ; y - z |
|
privateuse = "x" 1*("-" (1*8alphanum)) |
|
grandfathered = irregular ; non-redundant tags registered |
/ regular ; during the RFC 3066 era |
|
irregular = "en-GB-oed" ; irregular tags do not match |
/ "i-ami" ; the 'langtag' production and |
/ "i-bnn" ; would not otherwise be |
/ "i-default" ; considered 'well-formed' |
/ "i-enochian" ; These tags are all valid, |
/ "i-hak" ; but most are deprecated |
/ "i-klingon" ; in favor of more modern |
/ "i-lux" ; subtags or subtag |
/ "i-mingo" ; combination |
/ "i-navajo" |
/ "i-pwn" |
/ "i-tao" |
/ "i-tay" |
/ "i-tsu" |
/ "sgn-BE-FR" |
/ "sgn-BE-NL" |
/ "sgn-CH-DE" |
|
regular = "art-lojban" ; these tags match the 'langtag' |
/ "cel-gaulish" ; production, but their subtags |
/ "no-bok" ; are not extended language |
/ "no-nyn" ; or variant subtags: their meaning |
/ "zh-guoyu" ; is defined by their registration |
/ "zh-hakka" ; and all of these are deprecated |
/ "zh-min" ; in favor of a more modern |
/ "zh-min-nan" ; subtag or sequence of subtags |
/ "zh-xiang" |
|
alphanum = (ALPHA / DIGIT) ; letters and numbers |