8.13.0.1

3 Token Parsing🔗ℹ

The tokens used for grouping and indentation are distinct from other categories:

  ( ) [ ] { } '   ; ,   : |   « »  \

Other tokens are described by the grammar below, where a star (★) in the left column indicates the productions that correspond to terms or comments.

Numbers are supported directly in simple forms—decimal integers, decimal floating point, hexadecimal/octal/binary integers, and fractions—in all cases allowing _s between digits. A #{} escape provides access to the full Racket S-expression number grammar. Special floating-point values use a # notation: #inf, #neginf, and #nan.

Boolean literals are #true and #false. The void value is #void.

Identifiers are formed from Unicode alphanumeric characters plus _ and emoji sequences, where the initial character must not be a numeric character (unless that numeric character starts an emoji sequence, as in 1 followed by U+FE0F and U+20E3). An identifier can also be prefixed with #%; such identifiers are intended for use for “internal” names that are not normally visible. An identifier prefixed with ~ (and without #%) forms a keyword, analogous to prefixing an identifier with #: in Racket.

Operators are formed from Unicode symbolic and punctuation characters other than the ones listed above as distinct tokens (plus a few more, like ", ', and single-character emoji sequences), but | or : is also allowed in an operator name as long as it is not by itself, and some # combinations like #' and #, are also operators. A multi-character operator cannot end in +, -, or . to avoid ambiguity in cases like 1+-2 (which is 1 plus -2, not 1 and 2 combined with a +- operator), unless the operator contains only +, -, or . (so ++, --, and ... are allowed). Also, a multi-character operator cannot end in :, since that creates an ambiguity with an operator just before a block, except that a sequence containing only : is allowed. A multi-character operator cannot end with / or contain // or /*, because that can create ambiguities with comments.

Implicit in the grammar is the usual convention of choosing the largest possible match at the start of a stream. Not reflected in the grammar is a set of delimiter requirements: numbers, #true, and #false must be followed by a delimiter. For example, 1x is a lexical error, because the x after 1 is not a delimiter. Non-alphanumeric characters other than _ are delimiters.

Certain ambiguities related to number and operator parsing are resolved by special rules. A number ends with a trailing . only if the . cannot be treated as the start of a multi-character operator; also, a . that is not part of a multi-character operator cannot appear after a number. The + and - characters as a number prefix versus an operator are also subject to a special rule: they are parsed as operators when immediately preceded by an alphanumeric character, _, ., ), ], or } with no whitespace in between. For example, 1+2 is 1 plus 2, but 1 +2 is 1 followed by the number +2.

When a #{} escape describes an identifier S-expression, it is an identifier in the same sense as a shrubbery-notation identifier. The same holds for numbers, booleans, strings, byte strings, and keywords. A #{} escape must not describe a pair, because pairs are used to represent a parsed shrubbery, and allowing pairs would create ambiguous or ill-formed representations.

For more details on @ parsing, see At-Notation Parsing, but the table below describes the shape of @ forms.

 

identifier

 ::= 

plainident

  |  

#% plainident

 

 

plainident

 ::= 

alpha alphanum

 

alpha

 ::= 

alphabetic Unicode character or _

  |  

Unicode emoji sequence

 

alphanum

 ::= 

alpha

  |  

numeric Unicode character

 

 

keyword

 ::= 

~ plainident

 

 

operator

 ::= 

opchar tailopchar

not |, :, ~, ...

  |  

.

... or containing // ...

  |  

+

... or containing /*

  |  

-

  |  

: :

  |  

# hashopchar

 

opchar

 ::= 

symbolic Unicode character not in special

  |  

punctuation Unicode character not in special

  |  

one of : |

 

tailopchar

 ::= 

anything in opchar except +, -, ., :, /

 

hashopchar

 ::= 

one of ', ,, ;, :, |

 

special

 ::= 

one of (, ), [, ], {, }, ', «, »

  |  

one of ", ;, ,, #, \, _, @

  |  

single-character Unicode emoji sequence

 

 

number

 ::= 

integer

  |  

float

  |  

hexinteger

  |  

octalinteger

  |  

binaryinteger

  |  

fraction

 

integer

 ::= 

sign? nonneg

 

sign

 ::= 

one of + or -

 

nonneg

 ::= 

decimal usdecimal

 

decimal

 ::= 

one of 0 through 9

 

usdecimal

 ::= 

decimal

  |  

_ decimal

 

float

 ::= 

sign? nonneg . nonneg? exp?

  |  

sign? . nonneg exp?

  |  

sign? nonneg exp

  |  

#inf

  |  

#neginf

  |  

#nan

 

exp

 ::= 

e sign? nonneg

  |  

E sign? nonneg

 

hexinteger

 ::= 

0x hex ushex

 

hex

 ::= 

one of 0 through 9

  |  

one of a through f

  |  

one of A through F

 

ushex

 ::= 

hex

  |  

_ hex

 

octalinteger

 ::= 

0o octal usoctal

 

octal

 ::= 

one of 0 through 7

 

usoctal

 ::= 

octal

  |  

_ octal

 

binaryinteger

 ::= 

0b bit usbit

 

bit

 ::= 

one of 0 or 1

 

usbit

 ::= 

bit

  |  

_ bit

 

fraction

 ::= 

integer / nonneg

nonneg not 0

 

 

boolean

 ::= 

#true

  |  

#false

 

 

void

 ::= 

#void

 

 

string

 ::= 

" strelem "

 

strelem

 ::= 

like Racket, but no literal newline

\U ≤ 6 digits

 

 

bytestring

 ::= 

#" bytestrelem "

 

bytestrelem

 ::= 

like Racket, but no literal newline

 

 

sexpression

 ::= 

#{ racket }

 

racket

 ::= 

any non-pair Racket S-expression

 

 

comment

 ::= 

// nonnlchar

  |  

/* anychar */

nesting allowed

  |  

@// nonnlchar

only within text

  |  

@// atopen anychar atopen

only within text

  |  

#!  nonnlchar continue

 

nonnlchar

 ::= 

any character other than newline

 

continue

 ::= 

\ nonnlchar

 

 

atexpression

 ::= 

@ command arguments? text

no space between parts

  |  

@ text

no space between parts

  |  

@ splice

no space between parts

 

command

 ::= 

prefix identifier

no space between parts

  |  

keyword

  |  

operator

  |  

number

  |  

boolean

  |  

string

  |  

bytestring

  |  

racket

  |  

( group )

usual ,-separated

  |  

[ group ]

usual ,-separated

  |  

« group »

 

splice

  |  

 group »)

 

prefix

 ::= 

identifier operator

no space between parts

 

arguments

 ::= 

( group )

optional ,-separated

 

text

 ::= 

atopen text atclose

escapes in text

 

atopen

 ::= 

{

  |  

| asciisym {

 

atclose

 ::= 

}

  |  

} asciisym |

flips opener chars