3 Token Parsing
The tokens used for grouping and indentation are distinct from other categories:
( ) [ ] { } ' ; , : | « » \ |
Other tokens are described by the grammar below, where a star (★) in the left column indicates the productions that correspond to terms or comments.
Numbers are supported directly in simple forms—
Boolean literals are #true and #false. The void value is #void.
Identifiers are formed from Unicode alphanumeric characters plus _ and emoji sequences, where the initial character must not be a numeric character (unless that numeric character starts an emoji sequence, as in 1 followed by U+FE0F and U+20E3). An identifier can also be prefixed with #%; such identifiers are intended for use for “internal” names that are not normally visible. An identifier prefixed with ~ (and without #%) forms a keyword, analogous to prefixing an identifier with #: in Racket.
Operators are formed from Unicode symbolic and punctuation characters other than the ones listed above as distinct tokens (plus a few more, like ", ', and single-character emoji sequences), but | or : is also allowed in an operator name as long as it is not by itself, and some # combinations like #' and #, are also operators. A multi-character operator cannot end in +, -, or . to avoid ambiguity in cases like 1+-2 (which is 1 plus -2, not 1 and 2 combined with a +- operator), unless the operator contains only +, -, or . (so ++, --, and ... are allowed). Also, a multi-character operator cannot end in :, since that creates an ambiguity with an operator just before a block, except that a sequence containing only : is allowed. A multi-character operator cannot end with / or contain // or /*, because that can create ambiguities with comments.
Implicit in the grammar is the usual convention of choosing the largest possible match at the start of a stream. Not reflected in the grammar is a set of delimiter requirements: numbers, #true, and #false must be followed by a delimiter. For example, 1x is a lexical error, because the x after 1 is not a delimiter. Non-alphanumeric characters other than _ are delimiters.
Certain ambiguities related to number and operator parsing are resolved by special rules. A number ends with a trailing . only if the . cannot be treated as the start of a multi-character operator; also, a . that is not part of a multi-character operator cannot appear after a number. The + and - characters as a number prefix versus an operator are also subject to a special rule: they are parsed as operators when immediately preceded by an alphanumeric character, _, ., ), ], or } with no whitespace in between. For example, 1+2 is 1 plus 2, but 1 +2 is 1 followed by the number +2.
When a #{…} escape describes an identifier S-expression, it is an identifier in the same sense as a shrubbery-notation identifier. The same holds for numbers, booleans, strings, byte strings, and keywords. A #{…} escape must not describe a pair, because pairs are used to represent a parsed shrubbery, and allowing pairs would create ambiguous or ill-formed representations.
For more details on @ parsing, see At-Notation Parsing, but the table below describes the shape of @ forms.
★ | ‹identifier› | ::= | ‹plainident› | |
| | #% ‹plainident› | |||
| ||||
★ | ‹plainident› | ::= | ‹alpha› ‹‹alphanum›› | |
| ||||
‹alpha› | ::= | alphabetic Unicode character or _ | ||
| | Unicode emoji sequence | |||
| ||||
‹alphanum› | ::= | ‹alpha› | ||
| | numeric Unicode character | |||
| ||||
★ | ‹keyword› | ::= | ~ ‹plainident› | |
| ||||
★ | ‹operator› | ::= | ‹‹opchar›› ‹tailopchar› | not |, :, ~, ... |
| | ‹.› | ... or containing // ... | ||
| | ‹+› | ... or containing /* | ||
| | ‹-› | |||
| | : ‹:› | |||
| | # ‹hashopchar› | |||
| ||||
‹opchar› | ::= | symbolic Unicode character not in ‹special› | ||
| | punctuation Unicode character not in ‹special› | |||
| | one of : | | |||
| ||||
‹tailopchar› | ::= | anything in ‹opchar› except +, -, ., :, / | ||
| ||||
‹hashopchar› | ::= | one of ', ,, ;, :, | | ||
| ||||
‹special› | ::= | one of (, ), [, ], {, }, ', «, » | ||
| | one of ", ;, ,, #, \, _, @ | |||
| | single-character Unicode emoji sequence | |||
| ||||
★ | ‹number› | ::= | ‹integer› | |
| | ‹float› | |||
| | ‹hexinteger› | |||
| | ‹octalinteger› | |||
| | ‹binaryinteger› | |||
| | ‹fraction› | |||
| ||||
‹integer› | ::= | ‹sign›? ‹nonneg› | ||
| ||||
‹sign› | ::= | one of + or - | ||
| ||||
‹nonneg› | ::= | ‹decimal› ‹‹usdecimal›› | ||
| ||||
‹decimal› | ::= | one of 0 through 9 | ||
| ||||
‹usdecimal› | ::= | ‹decimal› | ||
| | _ ‹decimal› | |||
| ||||
‹float› | ::= | ‹sign›? ‹nonneg› . ‹nonneg›? ‹exp›? | ||
| | ‹sign›? . ‹nonneg› ‹exp›? | |||
| | ‹sign›? ‹nonneg› ‹exp› | |||
| | #inf | |||
| | #neginf | |||
| | #nan | |||
| ||||
‹exp› | ::= | e ‹sign›? ‹nonneg› | ||
| | E ‹sign›? ‹nonneg› | |||
| ||||
‹hexinteger› | ::= | 0x ‹hex› ‹‹ushex›› | ||
| ||||
‹hex› | ::= | one of 0 through 9 | ||
| | one of a through f | |||
| | one of A through F | |||
| ||||
‹ushex› | ::= | ‹hex› | ||
| | _ ‹hex› | |||
| ||||
‹octalinteger› | ::= | 0o ‹octal› ‹‹usoctal›› | ||
| ||||
‹octal› | ::= | one of 0 through 7 | ||
| ||||
‹usoctal› | ::= | ‹octal› | ||
| | _ ‹octal› | |||
| ||||
‹binaryinteger› | ::= | 0b ‹bit› ‹‹usbit›› | ||
| ||||
‹bit› | ::= | one of 0 or 1 | ||
| ||||
‹usbit› | ::= | ‹bit› | ||
| | _ ‹bit› | |||
| ||||
‹fraction› | ::= | ‹integer› / ‹nonneg› | ‹nonneg› not 0 | |
| ||||
★ | ‹boolean› | ::= | #true | |
| | #false | |||
| ||||
★ | ‹void› | ::= | #void | |
| ||||
★ | ‹string› | ::= | " ‹‹strelem›› " | |
| ||||
‹strelem› | ::= | like Racket, but no literal newline | \U ≤ 6 digits | |
| ||||
★ | ‹bytestring› | ::= | #" ‹‹bytestrelem›› " | |
| ||||
‹bytestrelem› | ::= | like Racket, but no literal newline | ||
| ||||
★ | ‹sexpression› | ::= | #{ ‹racket› } | |
| ||||
‹racket› | ::= | any non-pair Racket S-expression | ||
| ||||
★ | ‹comment› | ::= | // ‹‹nonnlchar›› | |
| | /* ‹‹anychar›› */ | nesting allowed | ||
| | @// ‹‹nonnlchar›› | only within ‹text› | ||
| | @// ‹atopen› ‹‹anychar›› ‹atopen› | only within ‹text› | ||
| | #! ‹‹nonnlchar›› ‹‹continue›› | |||
| ||||
‹nonnlchar› | ::= | any character other than newline | ||
| ||||
‹continue› | ::= | \ ‹‹nonnlchar›› | ||
| ||||
★ | ‹atexpression› | ::= | @ ‹command› ‹arguments›? ‹‹text›› | no space between parts |
| | @ ‹‹text›› | no space between parts | ||
| | @ ‹splice› | no space between parts | ||
| ||||
‹command› | ::= | ‹‹prefix›› ‹identifier› | no space between parts | |
| | ‹keyword› | |||
| | ‹operator› | |||
| | ‹number› | |||
| | ‹boolean› | |||
| | ‹string› | |||
| | ‹bytestring› | |||
| | ‹racket› | |||
| | ( ‹‹group›› ) | usual ,-separated | ||
| | [ ‹‹group›› ] | usual ,-separated | ||
| | « ‹group› » | |||
| ||||
‹splice› | | | (« ‹group› ») | ||
| ||||
‹prefix› | ::= | ‹identifier› ‹operator› | no space between parts | |
| ||||
‹arguments› | ::= | ( ‹‹group›› ) | optional ,-separated | |
| ||||
‹text› | ::= | ‹atopen› ‹text› ‹atclose› | escapes in ‹text› | |
| ||||
‹atopen› | ::= | { | ||
| | | ‹‹asciisym›› { | |||
| ||||
‹atclose› | ::= | } | ||
| | } ‹‹asciisym›› | | flips opener chars |