Lexers
This manual documents the public APIs in the lexers packages.
The library currently provides reusable lexers for multiple applications. Syntax coloring is the first intended application, but the lexer APIs are also designed to support other consumers.
1 Overview
The public language modules currently available are:
Each language module currently exposes two related kinds of API:
A projected token API intended for general consumers such as syntax coloring.
A derived-token API intended for richer language-specific inspection and testing.
The projected APIs are intentionally close to parser-tools/lex. They return bare symbols, token? values, and optional position-token? wrappers built from the actual parser-tools/lex structures, so existing parser-oriented tools can consume them more easily.
The current profile split is:
'coloring —
keeps trivia, emits 'unknown for recoverable malformed input, and includes source positions by default. 'compiler —
skips trivia by default, raises on malformed input, and includes source positions by default.
Across languages, the projected lexer constructors return one-argument port readers. Create the lexer once, call it repeatedly on the same input port, and stop when the result is an end-of-file token. The projected category symbols themselves, such as 'identifier, 'literal, and 'keyword, are intended to be the stable public API.
1.1 Token Helpers
The helper module lexers/token provides a small public API for inspecting wrapped or unwrapped projected token values without reaching directly into parser-tools/lex.
| (require lexers/token) | package: lexers-lib |
procedure
(lexer-token-name token) → symbol?
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-value token) → any/c
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-has-positions? token) → boolean?
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-start token) → (or/c position? #f)
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-end token) → (or/c position? #f)
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-eof? token) → boolean?
token : (or/c symbol? token? position-token?)
1.2 Profiles
The public projected APIs currently support the same profile names:
'coloring
'compiler
The current defaults are:
Profile |
| Trivia |
| Source Positions |
| Malformed Input |
'coloring |
| 'keep |
| #t |
| emit unknown tokens |
'compiler |
| 'skip |
| #t |
| raise an exception |
For the keyword arguments accepted by make-css-lexer, css-string->tokens, make-html-lexer, html-string->tokens, make-json-lexer, json-string->tokens, make-javascript-lexer, javascript-string->tokens, make-markdown-lexer, markdown-string->tokens, make-objc-lexer, objc-string->tokens, make-python-lexer, python-string->tokens, make-racket-lexer, racket-string->tokens, make-rhombus-lexer, rhombus-string->tokens, make-shell-lexer, make-cpp-lexer, cpp-string->tokens, shell-string->tokens, make-scribble-lexer, scribble-string->tokens, make-swift-lexer, swift-string->tokens, make-wat-lexer, and wat-string->tokens:
#:profile selects the named default bundle.
#:trivia 'profile-default means “use the trivia policy from the selected profile”.
#:source-positions 'profile-default means “use the source-position setting from the selected profile”.
An explicit #:trivia or #:source-positions value overrides the selected profile default.
2 CSS
| (require lexers/css) | package: lexers-lib |
The projected CSS API has two entry points:
make-css-lexer for streaming tokenization from an input port.
css-string->tokens for eager tokenization of an entire string.
procedure
(make-css-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token? whose payload is either a bare symbol such as 'eof or a token? carrying a projected category such as 'identifier, 'literal, 'comment, or 'unknown.
When #:source-positions is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
> (define lexer (make-css-lexer #:profile 'coloring))
> (define in (open-input-string "color: #fff;")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'identifier "color") (position 1 1 0) (position 6 1 5))
(position-token (token 'delimiter ":") (position 6 1 5) (position 7 1 6))
(position-token (token 'whitespace " ") (position 7 1 6) (position 8 1 7))
(position-token (token 'literal "#fff") (position 8 1 7) (position 12 1 11)))
procedure
(css-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-css-lexer. It opens a string port, enables line counting, repeatedly calls the port-based lexer until end-of-file, and returns the resulting token list.
2.1 CSS Returned Tokens
The projected CSS API returns values in the same general shape as parser-tools/lex:
The end of input is reported as 'eof, either directly or inside a position-token?.
Most ordinary results are token? values whose token-name is a projected category and whose token-value contains language-specific text or metadata.
When #:source-positions is true, each result is wrapped in a position-token?.
When #:source-positions is false, results are returned without that outer wrapper.
Common projected CSS categories include:
'whitespace
'comment
'identifier
'literal
'delimiter
'unknown
'eof
In 'coloring mode, whitespace and comments are kept, and recoverable malformed input is returned as 'unknown. In 'compiler mode, whitespace and comments are skipped by default, and malformed input raises an exception instead of producing an 'unknown token.
For the current CSS scaffold, token-value normally preserves the original source text of the emitted token. In particular:
For 'identifier, the value is the matched identifier text, such as "color" or "--brand-color".
For 'literal, the value is the matched literal text, such as "#fff", "12px", "url(foo.png)", or "rgb(".
For 'comment and 'whitespace, the value is the original comment or whitespace text when those categories are kept.
For 'delimiter, the value is the matched delimiter text, such as ":", ";", or "{".
For 'unknown in tolerant mode, the value is the malformed input text that could not be accepted.
> (define inspect-lexer (make-css-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "color: #fff;")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'identifier
> (lexer-token-value first-token) "color"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 6
procedure
→ (input-port? . -> . (or/c 'eof css-derived-token?))
The result is a procedure of one argument, an input port. Each call reads the next raw CSS token from the port, computes its CSS-specific derived classifications, and returns one derived token value. At end of input, it returns 'eof.
The intended use is the same as for make-css-lexer: create the lexer once, then call it repeatedly on the same port until it returns 'eof.
> (define derived-lexer (make-css-derived-lexer))
> (define derived-in (open-input-string "color: #fff;")) > (port-count-lines! derived-in)
> (list (derived-lexer derived-in) (derived-lexer derived-in) (derived-lexer derived-in) (derived-lexer derived-in))
(list
(css-derived-token
(css-raw-token 'ident-token "color" (position 1 1 0) (position 6 1 5))
'(property-name-candidate selector-token))
(css-derived-token
(css-raw-token 'colon-token ":" (position 6 1 5) (position 7 1 6))
'())
(css-derived-token
(css-raw-token 'whitespace-token " " (position 7 1 6) (position 8 1 7))
'())
(css-derived-token
(css-raw-token 'hash-token "#fff" (position 8 1 7) (position 12 1 11))
'(color-literal selector-token)))
procedure
(css-string->derived-tokens source)
→ (listof css-derived-token?) source : string?
This is a convenience wrapper over make-css-derived-lexer. It opens a string port, enables line counting, repeatedly calls the derived lexer until it returns 'eof, and returns the resulting list of derived tokens.
procedure
(css-derived-token? v) → boolean?
v : any/c
procedure
(css-derived-token-tags token) → (listof symbol?)
token : css-derived-token?
procedure
(css-derived-token-has-tag? token tag) → boolean?
token : css-derived-token? tag : symbol?
procedure
(css-derived-token-text token) → string?
token : css-derived-token?
procedure
(css-derived-token-start token) → position?
token : css-derived-token?
procedure
(css-derived-token-end token) → position?
token : css-derived-token?
2.2 CSS Derived Tokens
A derived CSS token pairs one raw CSS token with a small list of CSS-specific classification tags. This layer is more precise than the projected consumer-facing categories and is meant for inspection, testing, and language-aware tools.
The current CSS scaffold may attach tags such as:
'at-rule-name
'color-literal
'color-function
'selector-token
'property-name
'declaration-value-token
'function-name
'gradient-function
'custom-property-name
'property-name-candidate
'string-literal
'numeric-literal
'length-dimension
'malformed-token
> (define derived-tokens (css-string->derived-tokens ".foo { color: red; background: rgb(1 2 3); }"))
> (map (lambda (token) (list (css-derived-token-text token) (css-derived-token-tags token) (css-derived-token-has-tag? token 'selector-token) (css-derived-token-has-tag? token 'property-name) (css-derived-token-has-tag? token 'declaration-value-token) (css-derived-token-has-tag? token 'color-literal) (css-derived-token-has-tag? token 'function-name) (css-derived-token-has-tag? token 'color-function) (css-derived-token-has-tag? token 'custom-property-name) (css-derived-token-has-tag? token 'string-literal) (css-derived-token-has-tag? token 'numeric-literal) (css-derived-token-has-tag? token 'length-dimension))) derived-tokens)
'(("." () #f #f #f #f #f #f #f #f #f #f)
("foo"
(property-name-candidate selector-token)
(selector-token)
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("color"
(property-name-candidate property-name)
#f
(property-name)
#f
#f
#f
#f
#f
#f
#f
#f)
(":" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("red"
(property-name-candidate declaration-value-token)
#f
#f
(declaration-value-token)
#f
#f
#f
#f
#f
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("background"
(property-name-candidate property-name)
#f
(property-name)
#f
#f
#f
#f
#f
#f
#f
#f)
(":" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("rgb"
(function-name color-function declaration-value-token)
#f
#f
(declaration-value-token)
#f
(function-name color-function declaration-value-token)
(color-function declaration-value-token)
#f
#f
#f
#f)
("(" () #f #f #f #f #f #f #f #f #f #f)
("1"
(numeric-literal declaration-value-token)
#f
#f
(declaration-value-token)
#f
#f
#f
#f
#f
(numeric-literal declaration-value-token)
#f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("2"
(numeric-literal declaration-value-token)
#f
#f
(declaration-value-token)
#f
#f
#f
#f
#f
(numeric-literal declaration-value-token)
#f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("3"
(numeric-literal declaration-value-token)
#f
#f
(declaration-value-token)
#f
#f
#f
#f
#f
(numeric-literal declaration-value-token)
#f)
(")" () #f #f #f #f #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f))
value
css-profiles : immutable-hash?
3 HTML
| (require lexers/html) | package: lexers-lib |
The projected HTML API has two entry points:
make-html-lexer for streaming tokenization from an input port.
html-string->tokens for eager tokenization of an entire string.
procedure
(make-html-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
The projected HTML token stream includes ordinary markup tokens and inline delegated tokens from embedded <style> and <script> bodies.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
> (define lexer (make-html-lexer #:profile 'coloring))
> (define in (open-input-string "<section id=main>Hi</section>")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'delimiter "<") (position 1 1 0) (position 2 1 1))
(position-token
(token 'identifier "section")
(position 2 1 1)
(position 9 1 8))
(position-token (token 'whitespace " ") (position 9 1 8) (position 10 1 9))
(position-token
(token 'identifier "id")
(position 10 1 9)
(position 12 1 11)))
procedure
(html-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-html-lexer.
3.1 HTML Returned Tokens
Common projected HTML categories include:
'comment
'keyword
'identifier
'literal
'operator
'delimiter
'unknown
'eof
For the current HTML scaffold:
tag names and attribute names project as 'identifier
attribute values, text nodes, entities, and delegated CSS/JS literals project as 'literal
punctuation such as <, </, >, />, and embedded interpolation boundaries project as 'delimiter or 'operator
comments project as 'comment
doctype/declaration markup projects as 'keyword
> (define inspect-lexer (make-html-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "<!doctype html><main id=\"app\">Hi & bye</main>")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'keyword
> (lexer-token-value first-token) "<!doctype html>"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 16
procedure
→ (input-port? . -> . (or/c 'eof html-derived-token?))
procedure
(html-string->derived-tokens source)
→ (listof html-derived-token?) source : string?
procedure
(html-derived-token? v) → boolean?
v : any/c
procedure
(html-derived-token-tags token) → (listof symbol?)
token : html-derived-token?
procedure
(html-derived-token-has-tag? token tag) → boolean?
token : html-derived-token? tag : symbol?
procedure
(html-derived-token-text token) → string?
token : html-derived-token?
procedure
(html-derived-token-start token) → position?
token : html-derived-token?
procedure
(html-derived-token-end token) → position?
token : html-derived-token?
3.2 HTML Derived Tokens
The current HTML scaffold may attach tags such as:
'html-tag-name
'html-closing-tag-name
'html-attribute-name
'html-attribute-value
'html-text
'html-entity
'html-doctype
'comment
'embedded-css
'embedded-javascript
'malformed-token
Delegated CSS and JavaScript body tokens keep their reusable semantic tags and gain an additional language marker such as 'embedded-css or 'embedded-javascript.
> (define derived-tokens (html-string->derived-tokens "<!doctype html><section id=main class=\"card\">Hi & bye<style>.hero { color: #c33; }</style><script>const root = document.querySelector(\"#app\");</script></section>"))
> (map (lambda (token) (list (html-derived-token-text token) (html-derived-token-tags token) (html-derived-token-has-tag? token 'html-tag-name) (html-derived-token-has-tag? token 'html-attribute-name) (html-derived-token-has-tag? token 'html-attribute-value) (html-derived-token-has-tag? token 'html-text) (html-derived-token-has-tag? token 'html-entity) (html-derived-token-has-tag? token 'embedded-css) (html-derived-token-has-tag? token 'embedded-javascript))) derived-tokens)
'(("<!doctype html>" (keyword html-doctype) #f #f #f #f #f #f #f)
("<" (delimiter) #f #f #f #f #f #f #f)
("section" (identifier html-tag-name) (html-tag-name) #f #f #f #f #f #f)
(" " (whitespace) #f #f #f #f #f #f #f)
("id"
(identifier html-attribute-name)
#f
(html-attribute-name)
#f
#f
#f
#f
#f)
("=" (operator) #f #f #f #f #f #f #f)
("main"
(literal html-attribute-value)
#f
#f
(html-attribute-value)
#f
#f
#f
#f)
(" " (whitespace) #f #f #f #f #f #f #f)
("class"
(identifier html-attribute-name)
#f
(html-attribute-name)
#f
#f
#f
#f
#f)
("=" (operator) #f #f #f #f #f #f #f)
("\"card\""
(html-attribute-value literal)
#f
#f
(html-attribute-value literal)
#f
#f
#f
#f)
(">" (delimiter) #f #f #f #f #f #f #f)
("Hi " (literal html-text) #f #f #f (html-text) #f #f #f)
("&" (literal html-entity) #f #f #f #f (html-entity) #f #f)
(" bye" (literal html-text) #f #f #f (html-text) #f #f #f)
("<" (delimiter) #f #f #f #f #f #f #f)
("style" (identifier html-tag-name) (html-tag-name) #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f)
("." (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
("hero"
(embedded-css identifier property-name-candidate selector-token)
#f
#f
#f
#f
#f
(embedded-css identifier property-name-candidate selector-token)
#f)
(" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)
("{" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
(" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)
("color"
(embedded-css identifier property-name-candidate property-name)
#f
#f
#f
#f
#f
(embedded-css identifier property-name-candidate property-name)
#f)
(":" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
(" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)
("#c33"
(embedded-css literal color-literal declaration-value-token)
#f
#f
#f
#f
#f
(embedded-css literal color-literal declaration-value-token)
#f)
(";" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
(" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)
("}" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
("</" (delimiter) #f #f #f #f #f #f #f)
("style" (identifier html-closing-tag-name) #f #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f)
("<" (delimiter) #f #f #f #f #f #f #f)
("script" (identifier html-tag-name) (html-tag-name) #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f)
("const"
(embedded-javascript keyword)
#f
#f
#f
#f
#f
#f
(embedded-javascript keyword))
(" "
(embedded-javascript whitespace)
#f
#f
#f
#f
#f
#f
(embedded-javascript whitespace))
("root"
(embedded-javascript identifier declaration-name)
#f
#f
#f
#f
#f
#f
(embedded-javascript identifier declaration-name))
(" "
(embedded-javascript whitespace)
#f
#f
#f
#f
#f
#f
(embedded-javascript whitespace))
("="
(embedded-javascript operator)
#f
#f
#f
#f
#f
#f
(embedded-javascript operator))
(" "
(embedded-javascript whitespace)
#f
#f
#f
#f
#f
#f
(embedded-javascript whitespace))
("document"
(embedded-javascript identifier)
#f
#f
#f
#f
#f
#f
(embedded-javascript identifier))
("."
(embedded-javascript delimiter)
#f
#f
#f
#f
#f
#f
(embedded-javascript delimiter))
("querySelector"
(embedded-javascript identifier method-name property-name)
#f
#f
#f
#f
#f
#f
(embedded-javascript identifier method-name property-name))
("("
(embedded-javascript delimiter)
#f
#f
#f
#f
#f
#f
(embedded-javascript delimiter))
("\"#app\""
(embedded-javascript literal string-literal)
#f
#f
#f
#f
#f
#f
(embedded-javascript literal string-literal))
(")"
(embedded-javascript delimiter)
#f
#f
#f
#f
#f
#f
(embedded-javascript delimiter))
(";"
(embedded-javascript delimiter)
#f
#f
#f
#f
#f
#f
(embedded-javascript delimiter))
("</" (delimiter) #f #f #f #f #f #f #f)
("script" (identifier html-closing-tag-name) #f #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f)
("</" (delimiter) #f #f #f #f #f #f #f)
("section" (identifier html-closing-tag-name) #f #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f))
value
html-profiles : immutable-hash?
4 C
| (require lexers/c) | package: lexers-lib |
The projected C API has two entry points:
make-c-lexer for streaming tokenization from an input port.
c-string->tokens for eager tokenization of an entire string.
The first C implementation is a handwritten streaming lexer grounded primarily in C lexical and preprocessing-token rules. It is preprocessor-aware from the first slice, so directive lines like #include and #define are tokenized directly instead of being flattened into ordinary punctuation and identifiers.
procedure
(make-c-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected C categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.
Keywords and preprocessor directive names project as 'keyword. Header names such as <stdio.h> and "local.h" project as 'literal.
> (define lexer (make-c-lexer #:profile 'coloring))
> (define in (open-input-string "#include <stdio.h>\nint main(void) { return 0; }\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(delimiter keyword whitespace literal)
procedure
(c-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived C API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c c-derived-token? 'eof))
procedure
(c-string->derived-tokens source) → (listof c-derived-token?)
source : string?
procedure
(c-derived-token? v) → boolean?
v : any/c
procedure
(c-derived-token-tags token) → (listof symbol?)
token : c-derived-token?
procedure
(c-derived-token-has-tag? token tag) → boolean?
token : c-derived-token? tag : symbol?
procedure
(c-derived-token-text token) → string?
token : c-derived-token?
procedure
(c-derived-token-start token) → position?
token : c-derived-token?
procedure
(c-derived-token-end token) → position?
token : c-derived-token?
The first reusable C-specific derived tags include:
'c-comment
'c-whitespace
'c-keyword
'c-identifier
'c-string-literal
'c-char-literal
'c-numeric-literal
'c-operator
'c-delimiter
'c-preprocessor-directive
'c-header-name
'c-line-splice
'c-error
'malformed-token
Malformed C input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled c or h delegate to lexers/c. Wrapped delegated Markdown tokens preserve C-derived tags and gain 'embedded-c.}
value
c-profiles : immutable-hash?
5 C++
| (require lexers/cpp) | package: lexers-lib |
The projected C++ API has two entry points:
make-cpp-lexer for streaming tokenization from an input port.
cpp-string->tokens for eager tokenization of an entire string.
The first C++ implementation is a handwritten streaming lexer grounded in C++ lexical structure. It is preprocessor-aware and covers comments, identifiers, keywords, operator words, character and string literals, raw string literals, numeric literals, and punctuators such as :: and ->.
procedure
(make-cpp-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected C++ categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.
> (define lexer (make-cpp-lexer #:profile 'coloring))
> (define in (open-input-string "#include <vector>\nstd::string s;\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(delimiter keyword whitespace literal)
procedure
(cpp-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived C++ API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c cpp-derived-token? 'eof))
procedure
(cpp-string->derived-tokens source)
→ (listof cpp-derived-token?) source : string?
procedure
(cpp-derived-token? v) → boolean?
v : any/c
procedure
(cpp-derived-token-tags token) → (listof symbol?)
token : cpp-derived-token?
procedure
(cpp-derived-token-has-tag? token tag) → boolean?
token : cpp-derived-token? tag : symbol?
procedure
(cpp-derived-token-text token) → string?
token : cpp-derived-token?
procedure
(cpp-derived-token-start token) → position?
token : cpp-derived-token?
procedure
(cpp-derived-token-end token) → position?
token : cpp-derived-token?
The first reusable C++-specific derived tags include:
'cpp-comment
'cpp-whitespace
'cpp-keyword
'cpp-identifier
'cpp-string-literal
'cpp-char-literal
'cpp-numeric-literal
'cpp-operator
'cpp-delimiter
'cpp-preprocessor-directive
'cpp-header-name
'cpp-line-splice
'cpp-error
'malformed-token
Malformed C++ input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled cpp, c++, cc, cxx, hpp, hh, or hxx delegate to lexers/cpp. Wrapped delegated Markdown tokens preserve C++-derived tags and gain 'embedded-cpp.}
value
cpp-profiles : immutable-hash?
6 CSV
| (require lexers/csv) | package: lexers-lib |
The projected CSV API has two entry points:
make-csv-lexer for streaming tokenization from an input port.
csv-string->tokens for eager tokenization of an entire string.
The first CSV implementation is a handwritten streaming lexer for comma-separated text. It preserves exact source text, including empty fields and CRLF row separators.
procedure
(make-csv-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected CSV categories include 'literal, 'delimiter, and 'unknown.
Field contents project as 'literal. Field separators and row separators project as 'delimiter.
> (define lexer (make-csv-lexer #:profile 'coloring))
> (define in (open-input-string "name,age\nAda,37\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(literal delimiter literal delimiter)
procedure
(csv-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived CSV API provides reusable structure for delimited text:
procedure
→ (input-port? . -> . (or/c csv-derived-token? 'eof))
procedure
(csv-string->derived-tokens source)
→ (listof csv-derived-token?) source : string?
procedure
(csv-derived-token? v) → boolean?
v : any/c
procedure
(csv-derived-token-tags token) → (listof symbol?)
token : csv-derived-token?
procedure
(csv-derived-token-has-tag? token tag) → boolean?
token : csv-derived-token? tag : symbol?
procedure
(csv-derived-token-text token) → string?
token : csv-derived-token?
procedure
(csv-derived-token-start token) → position?
token : csv-derived-token?
procedure
(csv-derived-token-end token) → position?
token : csv-derived-token?
The first reusable CSV-specific derived tags include:
'delimited-field
'delimited-quoted-field
'delimited-unquoted-field
'delimited-empty-field
'delimited-separator
'delimited-row-separator
'delimited-error
'csv-field
'csv-separator
'csv-row-separator
'malformed-token
Malformed CSV input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled csv delegate to lexers/csv. Wrapped delegated Markdown tokens preserve CSV-derived tags and gain 'embedded-csv.}
value
csv-profiles : immutable-hash?
7 JSON
| (require lexers/json) | package: lexers-lib |
The projected JSON API has two entry points:
make-json-lexer for streaming tokenization from an input port.
json-string->tokens for eager tokenization of an entire string.
procedure
(make-json-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
Projected JSON categories include 'delimiter, 'operator, 'identifier, 'literal, 'whitespace, and 'unknown.
Object keys project as 'identifier, while numbers, ordinary strings, and the JSON keywords true, false, and null project as 'literal.
> (define lexer (make-json-lexer #:profile 'coloring))
> (define in (open-input-string "{\"x\": [1, true]}")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(delimiter identifier operator whitespace)
procedure
(json-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived JSON API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c json-derived-token? 'eof))
procedure
(json-string->derived-tokens source)
→ (listof json-derived-token?) source : string?
procedure
(json-derived-token? v) → boolean?
v : any/c
procedure
(json-derived-token-tags token) → (listof symbol?)
token : json-derived-token?
procedure
(json-derived-token-has-tag? token tag) → boolean?
token : json-derived-token? tag : symbol?
procedure
(json-derived-token-text token) → string?
token : json-derived-token?
procedure
(json-derived-token-start token) → position?
token : json-derived-token?
procedure
(json-derived-token-end token) → position?
token : json-derived-token?
The first reusable JSON-specific derived tags include:
'json-object-key
'json-string
'json-number
'json-true
'json-false
'json-null
'json-object-start
'json-object-end
'json-array-start
'json-array-end
'json-comma
'json-colon
'json-error
'malformed-token
Malformed JSON input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled json delegate to lexers/json. Wrapped delegated Markdown tokens preserve JSON-derived tags and gain 'embedded-json.}
8 Makefile
| (require lexers/makefile) | package: lexers-lib |
The projected Makefile API has two entry points:
make-makefile-lexer for streaming tokenization from an input port.
makefile-string->tokens for eager tokenization of an entire string.
The first Makefile implementation is a handwritten streaming lexer aimed at ordinary Makefile, GNUmakefile, and .mk inputs. It covers comments, directive lines, variable assignments, rule targets, recipe lines, variable references, delimiters, and CRLF-preserving source fidelity.
procedure
(make-makefile-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected Makefile categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.
Directive words such as include project as 'keyword. Assignment operators such as := and += project as 'operator. Rule separators such as : project as 'delimiter.
procedure
(makefile-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived Makefile API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c makefile-derived-token? 'eof))
procedure
(makefile-string->derived-tokens source)
→ (listof makefile-derived-token?) source : string?
procedure
v : any/c
procedure
(makefile-derived-token-tags token) → (listof symbol?)
token : makefile-derived-token?
procedure
(makefile-derived-token-has-tag? token tag) → boolean?
token : makefile-derived-token? tag : symbol?
procedure
(makefile-derived-token-text token) → string?
token : makefile-derived-token?
procedure
(makefile-derived-token-start token) → position?
token : makefile-derived-token?
procedure
(makefile-derived-token-end token) → position?
token : makefile-derived-token?
The first reusable Makefile-specific derived tags include:
'makefile-directive
'makefile-variable
'makefile-assignment-operator
'makefile-rule-target
'makefile-variable-reference
'malformed-token
Markdown fenced code blocks labeled make, makefile, or mk delegate to lexers/makefile. Wrapped delegated Markdown tokens preserve Makefile-derived tags and gain 'embedded-makefile.}
9 Plist
| (require lexers/plist) | package: lexers-lib |
The projected plist API has two entry points:
make-plist-lexer for streaming tokenization from an input port.
plist-string->tokens for eager tokenization of an entire string.
The first plist implementation is a handwritten streaming lexer for XML property-list files such as Info.plist. The first slice deliberately targets XML plists only; it does not attempt to cover binary bplist files.
procedure
(make-plist-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected plist categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.
XML declarations and plist doctypes project as 'keyword. Element content such as CFBundleName and Lexers projects as 'literal.
procedure
(plist-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived plist API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c plist-derived-token? 'eof))
procedure
(plist-string->derived-tokens source)
→ (listof plist-derived-token?) source : string?
procedure
(plist-derived-token? v) → boolean?
v : any/c
procedure
(plist-derived-token-tags token) → (listof symbol?)
token : plist-derived-token?
procedure
(plist-derived-token-has-tag? token tag) → boolean?
token : plist-derived-token? tag : symbol?
procedure
(plist-derived-token-text token) → string?
token : plist-derived-token?
procedure
(plist-derived-token-start token) → position?
token : plist-derived-token?
procedure
(plist-derived-token-end token) → position?
token : plist-derived-token?
The first reusable plist-specific derived tags include:
'plist-processing-instruction
'plist-doctype
'plist-tag-name
'plist-closing-tag-name
'plist-attribute-name
'plist-attribute-value
'plist-key-text
'plist-string-text
'plist-data-text
'plist-date-text
'plist-integer-text
'plist-real-text
'plist-text
'plist-comment
'malformed-token
Markdown fenced code blocks labeled plist delegate to lexers/plist. Wrapped delegated Markdown tokens preserve plist-derived tags and gain 'embedded-plist.}
10 YAML
| (require lexers/yaml) | package: lexers-lib |
The projected YAML API has two entry points:
make-yaml-lexer for streaming tokenization from an input port.
yaml-string->tokens for eager tokenization of an entire string.
The first YAML implementation is a handwritten streaming lexer grounded primarily in the YAML 1.2.2 lexical and structural rules. The first slice is deliberately parser-lite, but it covers practical block mappings, block sequences, flow delimiters, directives, document markers, quoted scalars, plain scalars, comments, and block scalar bodies.
procedure
(make-yaml-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected YAML categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'delimiter, and 'unknown.
Directive lines such as %YAML 1.2 project as 'keyword. Plain and quoted scalars project as 'literal. Structural markers such as :, -, [, ], {, }, and document markers project as 'delimiter.
> (define lexer (make-yaml-lexer #:profile 'coloring))
> (define in (open-input-string "name: Deploy\non:\n push:\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(literal delimiter whitespace literal)
procedure
(yaml-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived YAML API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c yaml-derived-token? 'eof))
procedure
(yaml-string->derived-tokens source)
→ (listof yaml-derived-token?) source : string?
procedure
(yaml-derived-token? v) → boolean?
v : any/c
procedure
(yaml-derived-token-tags token) → (listof symbol?)
token : yaml-derived-token?
procedure
(yaml-derived-token-has-tag? token tag) → boolean?
token : yaml-derived-token? tag : symbol?
procedure
(yaml-derived-token-text token) → string?
token : yaml-derived-token?
procedure
(yaml-derived-token-start token) → position?
token : yaml-derived-token?
procedure
(yaml-derived-token-end token) → position?
token : yaml-derived-token?
The first reusable YAML-specific derived tags include:
'yaml-comment
'yaml-whitespace
'yaml-directive
'yaml-document-marker
'yaml-sequence-indicator
'yaml-key-indicator
'yaml-value-indicator
'yaml-flow-delimiter
'yaml-anchor
'yaml-alias
'yaml-tag
'yaml-string-literal
'yaml-plain-scalar
'yaml-key-scalar
'yaml-boolean
'yaml-null
'yaml-number
'yaml-block-scalar-header
'yaml-block-scalar-content
'yaml-error
'malformed-token
Malformed YAML input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled yaml or yml delegate to lexers/yaml. Wrapped delegated Markdown tokens preserve YAML-derived tags and gain 'embedded-yaml.}
value
yaml-profiles : immutable-hash?
11 Markdown
| (require lexers/markdown) | package: lexers-lib |
The projected Markdown API has two entry points:
make-markdown-lexer for streaming tokenization from an input port.
markdown-string->tokens for eager tokenization of an entire string.
The first Markdown implementation is a handwritten, parser-lite, GitHub-flavored Markdown lexer. It is line-oriented and can delegate raw HTML and known fenced-code languages to the existing C, C++, CSV, HTML, CSS, JavaScript, JSON, Makefile, Objective-C, plist, Python, Racket, Scribble, shell, Swift, TSV, WAT, and YAML lexers.
procedure
(make-markdown-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next projected Markdown token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
> (define lexer (make-markdown-lexer #:profile 'coloring))
> (define in (open-input-string "# Title\n\n```js\nconst x = 1;\n```\n")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'delimiter "#") (position 1 1 0) (position 2 1 1))
(position-token (token 'whitespace " ") (position 2 1 1) (position 3 1 2))
(position-token (token 'literal "Title") (position 3 1 2) (position 8 1 7))
(position-token (token 'whitespace "\n") (position 8 1 7) (position 9 2 0)))
procedure
(markdown-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-markdown-lexer.
11.1 Markdown Returned Tokens
Common projected Markdown categories include:
'whitespace
'identifier
'literal
'keyword
'operator
'delimiter
'comment
'unknown
'eof
For the current Markdown scaffold:
ordinary prose, inline code text, code-block text, and link or image payload text project mostly as 'literal
language names and delegated name-like tokens project as 'identifier or 'keyword, depending on the delegated lexer
structural markers such as heading markers, list markers, brackets, pipes, backticks, and fence delimiters project as 'delimiter
comments only appear through delegated embedded HTML
recoverable malformed constructs project as 'unknown in 'coloring mode and raise in 'compiler mode
For source continuity, the derived Markdown stream preserves the newline after a fenced-code info string as an explicit whitespace token before the code body. Incomplete fenced-code blocks are tokenized best-effort instead of raising an internal error.
> (define inspect-lexer (make-markdown-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "# Title\n\nText with <span class=\"x\">hi</span>\n")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'delimiter
> (lexer-token-value first-token) "#"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 2
procedure
→ (input-port? . -> . (or/c 'eof markdown-derived-token?))
procedure
(markdown-string->derived-tokens source)
→ (listof markdown-derived-token?) source : string?
procedure
v : any/c
procedure
(markdown-derived-token-tags token) → (listof symbol?)
token : markdown-derived-token?
procedure
(markdown-derived-token-has-tag? token tag) → boolean?
token : markdown-derived-token? tag : symbol?
procedure
(markdown-derived-token-text token) → string?
token : markdown-derived-token?
procedure
(markdown-derived-token-start token) → position?
token : markdown-derived-token?
procedure
(markdown-derived-token-end token) → position?
token : markdown-derived-token?
11.2 Markdown Derived Tokens
The current Markdown scaffold may attach tags such as:
'markdown-text
'markdown-heading-marker
'markdown-heading-text
'markdown-blockquote-marker
'markdown-list-marker
'markdown-task-marker
'markdown-thematic-break
'markdown-code-span
'markdown-code-fence
'markdown-code-block
'markdown-code-info-string
'markdown-emphasis-delimiter
'markdown-strong-delimiter
'markdown-strikethrough-delimiter
'markdown-link-text
'markdown-link-destination
'markdown-link-title
'markdown-image-marker
'markdown-autolink
'markdown-table-pipe
'markdown-table-alignment
'markdown-table-cell
'markdown-escape
'markdown-hard-line-break
'embedded-html
'embedded-css
'embedded-cpp
'embedded-csv
'embedded-javascript
'embedded-json
'embedded-makefile
'embedded-latex
'embedded-objc
'embedded-pascal
'embedded-plist
'embedded-python
'embedded-racket
'embedded-rust
'embedded-shell
'embedded-scribble
'embedded-swift
'embedded-tex
'embedded-tsv
'embedded-wat
'embedded-yaml
'malformed-token
Delegated raw HTML and recognized fenced-code languages keep their reusable derived tags and gain Markdown embedding markers such as 'embedded-html, 'embedded-cpp, 'embedded-csv, 'embedded-javascript, 'embedded-json, 'embedded-latex, 'embedded-makefile, 'embedded-objc, 'embedded-pascal, 'embedded-plist, 'embedded-python, 'embedded-racket, 'embedded-rust, 'embedded-shell, 'embedded-swift, 'embedded-tex, 'embedded-tsv, 'embedded-wat, or 'embedded-yaml.
> (define derived-tokens (markdown-string->derived-tokens "# Title\n\n- [x] done\n\n```js\nconst x = 1;\n```\n\nText <span class=\"x\">hi</span>\n"))
> (map (lambda (token) (list (markdown-derived-token-text token) (markdown-derived-token-tags token))) derived-tokens)
'(("#" (delimiter markdown-heading-marker))
(" " (whitespace))
("Title" (literal markdown-heading-text))
("\n" (whitespace))
("\n" (whitespace))
("-" (delimiter markdown-list-marker))
(" " (whitespace))
("[x]" (delimiter markdown-task-marker))
(" " (whitespace))
("done" (literal markdown-text))
("\n" (whitespace))
("\n" (whitespace))
("```" (delimiter markdown-code-fence))
("js" (identifier markdown-code-info-string))
("\n" (whitespace))
("const" (keyword embedded-javascript markdown-code-block))
(" " (whitespace embedded-javascript markdown-code-block))
("x" (identifier declaration-name embedded-javascript markdown-code-block))
(" " (whitespace embedded-javascript markdown-code-block))
("=" (operator embedded-javascript markdown-code-block))
(" " (whitespace embedded-javascript markdown-code-block))
("1" (literal numeric-literal embedded-javascript markdown-code-block))
(";" (delimiter embedded-javascript markdown-code-block))
("\n" (whitespace embedded-javascript markdown-code-block))
("```" (delimiter markdown-code-fence))
("\n" (whitespace))
("\n" (whitespace))
("Text " (literal markdown-text))
("<" (delimiter embedded-html))
("span" (identifier html-tag-name embedded-html))
(" " (whitespace embedded-html))
("class" (identifier html-attribute-name embedded-html))
("=" (operator embedded-html))
("\"x\"" (literal html-attribute-value embedded-html))
(">" (delimiter embedded-html))
("hi" (literal html-text embedded-html))
("</" (delimiter embedded-html))
("span" (identifier html-closing-tag-name embedded-html))
(">" (delimiter embedded-html))
("\n" (whitespace)))
value
markdown-profiles : immutable-hash?
12 Objective-C
| (require lexers/objc) | package: lexers-lib |
The projected Objective-C API has two entry points:
make-objc-lexer for streaming tokenization from an input port.
objc-string->tokens for eager tokenization of an entire string.
The first Objective-C implementation is a handwritten streaming lexer grounded in the language’s lexical surface and existing Objective-C lexer prior art. It is preprocessor-aware and covers comments, identifiers, C / Objective-C keywords, at-sign Objective-C keywords, Objective-C strings, object-literal introducers, numbers, operators, and delimiters.
procedure
(make-objc-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected Objective-C categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.
> (define lexer (make-objc-lexer #:profile 'coloring))
> (define in (open-input-string "@interface Foo : NSObject\n@property NSString *name;\n@end\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(keyword whitespace identifier whitespace)
procedure
(objc-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived Objective-C API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c objc-derived-token? 'eof))
procedure
(objc-string->derived-tokens source)
→ (listof objc-derived-token?) source : string?
procedure
(objc-derived-token? v) → boolean?
v : any/c
procedure
(objc-derived-token-tags token) → (listof symbol?)
token : objc-derived-token?
procedure
(objc-derived-token-has-tag? token tag) → boolean?
token : objc-derived-token? tag : symbol?
procedure
(objc-derived-token-text token) → string?
token : objc-derived-token?
procedure
(objc-derived-token-start token) → position?
token : objc-derived-token?
procedure
(objc-derived-token-end token) → position?
token : objc-derived-token?
The first reusable Objective-C-specific derived tags include:
'objc-comment
'objc-whitespace
'objc-keyword
'objc-at-keyword
'objc-identifier
'objc-string-literal
'objc-char-literal
'objc-numeric-literal
'objc-operator
'objc-delimiter
'objc-preprocessor-directive
'objc-header-name
'objc-literal-introducer
'objc-line-splice
'objc-error
'malformed-token
Malformed Objective-C input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled objc, objective-c, objectivec, or obj-c delegate to lexers/objc. Wrapped delegated Markdown tokens preserve Objective-C-derived tags and gain 'embedded-objc.}
value
objc-profiles : immutable-hash?
13 Pascal
| (require lexers/pascal) | package: lexers-lib |
The projected Pascal API has two entry points:
make-pascal-lexer for streaming tokenization from an input port.
pascal-string->tokens for eager tokenization of an entire string.
The first Pascal implementation is a handwritten streaming lexer grounded in the Free Pascal token reference. It covers whitespace, three comment forms, identifiers, escaped reserved-word identifiers, reserved words, numeric literals, strings, control-string fragments, operators, and delimiters.
procedure
(make-pascal-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected Pascal categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.
> (define lexer (make-pascal-lexer #:profile 'coloring))
> (define in (open-input-string "program Test;\nvar &do: Integer;\nbegin\nend.\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(keyword whitespace identifier delimiter)
procedure
(pascal-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived Pascal API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c pascal-derived-token? 'eof))
procedure
(pascal-string->derived-tokens source)
→ (listof pascal-derived-token?) source : string?
procedure
v : any/c
procedure
(pascal-derived-token-tags token) → (listof symbol?)
token : pascal-derived-token?
procedure
(pascal-derived-token-has-tag? token tag) → boolean?
token : pascal-derived-token? tag : symbol?
procedure
(pascal-derived-token-text token) → string?
token : pascal-derived-token?
procedure
(pascal-derived-token-start token) → position?
token : pascal-derived-token?
procedure
(pascal-derived-token-end token) → position?
token : pascal-derived-token?
The first reusable Pascal-specific derived tags include:
'pascal-comment
'pascal-whitespace
'pascal-keyword
'pascal-identifier
'pascal-escaped-identifier
'pascal-string-literal
'pascal-control-string
'pascal-numeric-literal
'pascal-operator
'pascal-delimiter
'malformed-token
Malformed Pascal input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled pascal, pas, delphi, and objectpascal delegate to lexers/pascal. Wrapped delegated Markdown tokens preserve Pascal-derived tags and gain 'embedded-pascal.}
value
pascal-profiles : immutable-hash?
14 Python
| (require lexers/python) | package: lexers-lib |
The projected Python API has two entry points:
make-python-lexer for streaming tokenization from an input port.
python-string->tokens for eager tokenization of an entire string.
The first Python implementation is a handwritten streaming lexer grounded in Python’s lexical-analysis rules. It tracks indentation-sensitive line starts, physical and logical newlines, names, comments, strings, numbers, operators, and delimiters.
procedure
(make-python-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected Python categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.
Soft keywords currently project as 'keyword, while the derived layer keeps the more specific 'python-soft-keyword tag.
> (define lexer (make-python-lexer #:profile 'coloring))
> (define in (open-input-string "def answer(x):\n return x\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(keyword whitespace identifier delimiter)
procedure
(python-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived Python API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c python-derived-token? 'eof))
procedure
(python-string->derived-tokens source)
→ (listof python-derived-token?) source : string?
procedure
v : any/c
procedure
(python-derived-token-tags token) → (listof symbol?)
token : python-derived-token?
procedure
(python-derived-token-has-tag? token tag) → boolean?
token : python-derived-token? tag : symbol?
procedure
(python-derived-token-text token) → string?
token : python-derived-token?
procedure
(python-derived-token-start token) → position?
token : python-derived-token?
procedure
(python-derived-token-end token) → position?
token : python-derived-token?
The first reusable Python-specific derived tags include:
'python-comment
'python-whitespace
'python-newline
'python-nl
'python-line-join
'python-keyword
'python-soft-keyword
'python-identifier
'python-string-literal
'python-bytes-literal
'python-numeric-literal
'python-operator
'python-delimiter
'python-indent
'python-dedent
'python-error
'malformed-token
Malformed Python input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled python and py delegate to lexers/python. Wrapped delegated Markdown tokens preserve Python-derived tags and gain 'embedded-python.}
15 Shell
| (require lexers/shell) | package: lexers-lib |
The projected shell API has two entry points:
make-shell-lexer for streaming tokenization from an input port.
shell-string->tokens for eager tokenization of an entire string.
The first shell implementation is a handwritten lexer for reusable shell tokenization. It currently supports Bash, Zsh, and PowerShell. The public API defaults to Bash and accepts #:shell 'bash, #:shell 'zsh, and #:shell 'powershell (with 'pwsh accepted as an alias).
procedure
(make-shell-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions #:shell shell]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default shell : (or/c 'bash 'zsh 'powershell 'pwsh) = 'bash
procedure
(shell-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions #:shell shell]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default shell : (or/c 'bash 'zsh 'powershell 'pwsh) = 'bash
15.1 Shell Returned Tokens
Common projected shell categories include:
'whitespace
'comment
'keyword
'identifier
'literal
'delimiter
'unknown
'eof
For the current shell scaffold:
keywords and builtins project as 'keyword
remaining words project as 'identifier
strings, variables, command substitutions, options, and numeric literals project as 'literal
operators and punctuation project as 'delimiter
malformed string or substitution input projects as 'unknown in 'coloring mode and raises in 'compiler mode
Projected and derived shell token text preserve the exact consumed source slice, including comments, whitespace, and CRLF line endings.
> (define lexer (make-shell-lexer #:profile 'coloring #:shell 'bash))
> (define in (open-input-string "export PATH\necho $PATH\n")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'keyword "export") (position 1 1 0) (position 7 1 6))
(position-token (token 'whitespace " ") (position 7 1 6) (position 8 1 7))
(position-token
(token 'identifier "PATH")
(position 8 1 7)
(position 12 1 11)))
procedure
(make-shell-derived-lexer [#:shell shell])
→ (input-port? . -> . (or/c 'eof shell-derived-token?)) shell : (or/c 'bash 'zsh 'powershell 'pwsh) = 'bash
procedure
(shell-string->derived-tokens source [ #:shell shell]) → (listof shell-derived-token?) source : string? shell : (or/c 'bash 'zsh 'powershell 'pwsh) = 'bash
procedure
(shell-derived-token? v) → boolean?
v : any/c
procedure
(shell-derived-token-tags token) → (listof symbol?)
token : shell-derived-token?
procedure
(shell-derived-token-has-tag? token tag) → boolean?
token : shell-derived-token? tag : symbol?
procedure
(shell-derived-token-text token) → string?
token : shell-derived-token?
procedure
(shell-derived-token-start token) → position?
token : shell-derived-token?
procedure
(shell-derived-token-end token) → position?
token : shell-derived-token?
15.2 Shell Derived Tokens
The current shell scaffold may attach tags such as:
'shell-keyword
'shell-builtin
'shell-word
'shell-string-literal
'shell-variable
'shell-command-substitution
'shell-comment
'shell-option
'shell-numeric-literal
'shell-punctuation
'malformed-token
Markdown fenced code blocks delegate to lexers/shell for bash, sh, shell, zsh, powershell, pwsh, and ps1 info strings. Delegated Markdown tokens keep the shell tags and gain 'embedded-shell.
> (define derived-tokens (shell-string->derived-tokens "printf \"%s\\n\" $(pwd)\n# done\n"))
> (map (lambda (token) (list (shell-derived-token-text token) (shell-derived-token-tags token))) derived-tokens)
'(("printf" (keyword shell-builtin))
(" " (whitespace))
("\"%s\\n\"" (literal shell-string-literal))
(" " (whitespace))
("$(pwd)" (literal shell-command-substitution))
("\n" (whitespace))
("# done" (comment shell-comment))
("\n" (whitespace)))
value
shell-profiles : immutable-hash?
16 Rust
| (require lexers/rust) | package: lexers-lib |
The projected Rust API has two entry points:
make-rust-lexer for streaming tokenization from an input port.
rust-string->tokens for eager tokenization of an entire string.
The first Rust implementation is a handwritten streaming lexer grounded in the Rust lexical structure reference. It covers whitespace, line and nested block comments, identifiers, raw identifiers, keywords, lifetimes, strings, raw strings, character and byte literals, numeric literals, punctuation, and delimiters.
procedure
(make-rust-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected Rust categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.
> (define lexer (make-rust-lexer #:profile 'coloring))
> (define in (open-input-string "fn main() {\n let r#type = 42u32;\n}\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(keyword whitespace identifier delimiter)
procedure
(rust-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived Rust API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c rust-derived-token? 'eof))
procedure
(rust-string->derived-tokens source)
→ (listof rust-derived-token?) source : string?
procedure
(rust-derived-token? v) → boolean?
v : any/c
procedure
(rust-derived-token-tags token) → (listof symbol?)
token : rust-derived-token?
procedure
(rust-derived-token-has-tag? token tag) → boolean?
token : rust-derived-token? tag : symbol?
procedure
(rust-derived-token-text token) → string?
token : rust-derived-token?
procedure
(rust-derived-token-start token) → position?
token : rust-derived-token?
procedure
(rust-derived-token-end token) → position?
token : rust-derived-token?
The first reusable Rust-specific derived tags include:
'rust-comment
'rust-doc-comment
'rust-whitespace
'rust-keyword
'rust-identifier
'rust-raw-identifier
'rust-lifetime
'rust-string-literal
'rust-raw-string-literal
'rust-char-literal
'rust-byte-literal
'rust-byte-string-literal
'rust-c-string-literal
'rust-numeric-literal
'rust-punctuation
'rust-delimiter
'malformed-token
Malformed Rust input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled rust or rs delegate to lexers/rust. Wrapped delegated Markdown tokens preserve Rust-derived tags and gain 'embedded-rust.}
value
rust-profiles : immutable-hash?
17 Swift
| (require lexers/swift) | package: lexers-lib |
The projected Swift API has two entry points:
make-swift-lexer for streaming tokenization from an input port.
swift-string->tokens for eager tokenization of an entire string.
The first Swift implementation is a handwritten streaming lexer grounded in Swift lexical structure. It covers whitespace, line comments, nested block comments, identifiers, keywords, attributes, pound directives, strings, numbers, operators, and delimiters.
procedure
(make-swift-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected Swift categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.
> (define lexer (make-swift-lexer #:profile 'coloring))
> (define in (open-input-string "import UIKit\n@IBOutlet weak var label: UILabel!\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(keyword whitespace identifier whitespace)
procedure
(swift-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived Swift API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c swift-derived-token? 'eof))
procedure
(swift-string->derived-tokens source)
→ (listof swift-derived-token?) source : string?
procedure
(swift-derived-token? v) → boolean?
v : any/c
procedure
(swift-derived-token-tags token) → (listof symbol?)
token : swift-derived-token?
procedure
(swift-derived-token-has-tag? token tag) → boolean?
token : swift-derived-token? tag : symbol?
procedure
(swift-derived-token-text token) → string?
token : swift-derived-token?
procedure
(swift-derived-token-start token) → position?
token : swift-derived-token?
procedure
(swift-derived-token-end token) → position?
token : swift-derived-token?
The first reusable Swift-specific derived tags include:
'swift-comment
'swift-whitespace
'swift-keyword
'swift-identifier
'swift-string-literal
'swift-numeric-literal
'swift-attribute
'swift-pound-directive
'swift-operator
'swift-delimiter
'swift-error
'malformed-token
Malformed Swift input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled swift delegate to lexers/swift. Wrapped delegated Markdown tokens preserve Swift-derived tags and gain 'embedded-swift.}
18 TeX
| (require lexers/tex) | package: lexers-lib |
The projected TeX API has two entry points:
make-tex-lexer for streaming tokenization from an input port.
tex-string->tokens for eager tokenization of an entire string.
The first TeX implementation is a handwritten streaming lexer grounded in TeX’s tokenization model, but it intentionally stays within a practical static subset. It covers comments, whitespace, control words, control symbols, group and optional delimiters, math shifts, parameter markers, and plain text runs.
procedure
(make-tex-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected TeX categories include 'comment, 'whitespace, 'identifier, 'literal, 'delimiter, and 'unknown.
> (define lexer (make-tex-lexer #:profile 'coloring))
> (define in (open-input-string "\\section{Hi}\n$x+y$\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(identifier delimiter literal delimiter)
procedure
(tex-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived TeX API provides reusable language-specific structure:
procedure
→ (input-port? . -> . (or/c tex-derived-token? 'eof))
procedure
(tex-string->derived-tokens source)
→ (listof tex-derived-token?) source : string?
procedure
(tex-derived-token? v) → boolean?
v : any/c
procedure
(tex-derived-token-tags token) → (listof symbol?)
token : tex-derived-token?
procedure
(tex-derived-token-has-tag? token tag) → boolean?
token : tex-derived-token? tag : symbol?
procedure
(tex-derived-token-text token) → string?
token : tex-derived-token?
procedure
(tex-derived-token-start token) → position?
token : tex-derived-token?
procedure
(tex-derived-token-end token) → position?
token : tex-derived-token?
The first reusable TeX-specific derived tags include:
'tex-comment
'tex-whitespace
'tex-control-word
'tex-control-symbol
'tex-parameter
'tex-text
'tex-math-shift
'tex-group-delimiter
'tex-optional-delimiter
'tex-special-character
'malformed-token
Markdown fenced code blocks labeled tex delegate to lexers/tex. Wrapped delegated Markdown tokens preserve TeX-derived tags and gain 'embedded-tex.}
value
tex-profiles : immutable-hash?
19 LaTeX
| (require lexers/latex) | package: lexers-lib |
The projected LaTeX API has two entry points:
make-latex-lexer for streaming tokenization from an input port.
latex-string->tokens for eager tokenization of an entire string.
The first LaTeX implementation builds on the TeX lexer and adds a lightweight classification layer for common LaTeX commands such as \section, \begin, and \end.
procedure
(make-latex-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected LaTeX categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'delimiter, and 'unknown.
procedure
(latex-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived LaTeX API reuses the TeX token representation and adds LaTeX tags where applicable:
procedure
→ (input-port? . -> . (or/c latex-derived-token? 'eof))
procedure
(latex-string->derived-tokens source)
→ (listof latex-derived-token?) source : string?
procedure
(latex-derived-token? v) → boolean?
v : any/c
procedure
(latex-derived-token-tags token) → (listof symbol?)
token : latex-derived-token?
procedure
(latex-derived-token-has-tag? token tag) → boolean?
token : latex-derived-token? tag : symbol?
procedure
(latex-derived-token-text token) → string?
token : latex-derived-token?
procedure
(latex-derived-token-start token) → position?
token : latex-derived-token?
procedure
(latex-derived-token-end token) → position?
token : latex-derived-token?
Common additional LaTeX-oriented derived tags include:
'latex-command
'latex-environment-command
Markdown fenced code blocks labeled latex delegate to lexers/latex. Wrapped delegated Markdown tokens preserve LaTeX-derived tags and gain 'embedded-latex.}
value
latex-profiles : immutable-hash?
20 TSV
| (require lexers/tsv) | package: lexers-lib |
The projected TSV API has two entry points:
make-tsv-lexer for streaming tokenization from an input port.
tsv-string->tokens for eager tokenization of an entire string.
The first TSV implementation is a handwritten streaming lexer for tab-separated text. It preserves exact source text, including literal tab separators, empty fields, and CRLF row separators.
procedure
(make-tsv-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
Projected TSV categories include 'literal, 'delimiter, and 'unknown.
Field contents project as 'literal. Field separators and row separators project as 'delimiter.
> (define lexer (make-tsv-lexer #:profile 'coloring))
> (define in (open-input-string "name\tage\nAda\t37\n")) > (port-count-lines! in)
> (list (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in)) (lexer-token-name (lexer in))) '(literal delimiter literal delimiter)
procedure
(tsv-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The derived TSV API provides reusable structure for delimited text:
procedure
→ (input-port? . -> . (or/c tsv-derived-token? 'eof))
procedure
(tsv-string->derived-tokens source)
→ (listof tsv-derived-token?) source : string?
procedure
(tsv-derived-token? v) → boolean?
v : any/c
procedure
(tsv-derived-token-tags token) → (listof symbol?)
token : tsv-derived-token?
procedure
(tsv-derived-token-has-tag? token tag) → boolean?
token : tsv-derived-token? tag : symbol?
procedure
(tsv-derived-token-text token) → string?
token : tsv-derived-token?
procedure
(tsv-derived-token-start token) → position?
token : tsv-derived-token?
procedure
(tsv-derived-token-end token) → position?
token : tsv-derived-token?
The first reusable TSV-specific derived tags include:
'delimited-field
'delimited-quoted-field
'delimited-unquoted-field
'delimited-empty-field
'delimited-separator
'delimited-row-separator
'delimited-error
'tsv-field
'tsv-separator
'tsv-row-separator
'malformed-token
Malformed TSV input is handled using the shared profile rules:
In the 'coloring profile, malformed input projects as 'unknown.
In the 'compiler profile, malformed input raises a read exception.
Markdown fenced code blocks labeled tsv delegate to lexers/tsv. Wrapped delegated Markdown tokens preserve TSV-derived tags and gain 'embedded-tsv.}
value
tsv-profiles : immutable-hash?
21 WAT
| (require lexers/wat) | package: lexers-lib |
The projected WAT API has two entry points:
make-wat-lexer for streaming tokenization from an input port.
wat-string->tokens for eager tokenization of an entire string.
The first WAT implementation is a handwritten lexer for WebAssembly text format. It targets WAT only, not binary .wasm files.
procedure
(make-wat-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next projected WAT token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
The streaming port readers emit tokens incrementally. They do not buffer the entire remaining input before producing the first token.
> (define lexer (make-wat-lexer #:profile 'coloring))
> (define in (open-input-string "(module (func (result i32) (i32.const 42)))")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'delimiter "(") (position 1 1 0) (position 2 1 1))
(position-token (token 'keyword "module") (position 2 1 1) (position 8 1 7))
(position-token (token 'whitespace " ") (position 8 1 7) (position 9 1 8))
(position-token (token 'delimiter "(") (position 9 1 8) (position 10 1 9)))
procedure
(wat-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-wat-lexer.
21.1 WAT Returned Tokens
Common projected WAT categories include:
'whitespace
'comment
'identifier
'keyword
'literal
'delimiter
'unknown
'eof
For the current WAT scaffold:
form names, type names, and instruction names project as 'keyword
$-prefixed names and remaining word-like names project as 'identifier
strings and numeric literals project as 'literal
parentheses project as 'delimiter
comments project as 'comment
malformed input projects as 'unknown in 'coloring mode and raises in 'compiler mode
Projected and derived token text preserve the exact source slice, including whitespace and comments.
> (define inspect-lexer (make-wat-lexer #:profile 'coloring))
> (define inspect-in (open-input-string ";; line comment\n(module (func (result i32) (i32.const 42)))")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'comment
> (lexer-token-value first-token) ";; line comment"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 16
procedure
→ (input-port? . -> . (or/c 'eof wat-derived-token?))
procedure
(wat-string->derived-tokens source)
→ (listof wat-derived-token?) source : string?
procedure
(wat-derived-token? v) → boolean?
v : any/c
procedure
(wat-derived-token-tags token) → (listof symbol?)
token : wat-derived-token?
procedure
(wat-derived-token-has-tag? token tag) → boolean?
token : wat-derived-token? tag : symbol?
procedure
(wat-derived-token-text token) → string?
token : wat-derived-token?
procedure
(wat-derived-token-start token) → position?
token : wat-derived-token?
procedure
(wat-derived-token-end token) → position?
token : wat-derived-token?
21.2 WAT Derived Tokens
The current WAT scaffold may attach tags such as:
'wat-form
'wat-type
'wat-instruction
'wat-identifier
'wat-string-literal
'wat-numeric-literal
'comment
'whitespace
'malformed-token
> (define derived-tokens (wat-string->derived-tokens "(module (func $answer (result i32) i32.const 42))"))
> (map (lambda (token) (list (wat-derived-token-text token) (wat-derived-token-tags token))) derived-tokens)
'(("(" (delimiter))
("module" (keyword wat-form))
(" " (whitespace))
("(" (delimiter))
("func" (keyword wat-form))
(" " (whitespace))
("$answer" (identifier wat-identifier))
(" " (whitespace))
("(" (delimiter))
("result" (keyword wat-form))
(" " (whitespace))
("i32" (keyword wat-type))
(")" (delimiter))
(" " (whitespace))
("i32.const" (keyword wat-instruction))
(" " (whitespace))
("42" (literal wat-numeric-literal))
(")" (delimiter))
(")" (delimiter)))
value
wat-profiles : immutable-hash?
22 Racket
| (require lexers/racket) | package: lexers-lib |
The projected Racket API has two entry points:
make-racket-lexer for streaming tokenization from an input port.
racket-string->tokens for eager tokenization of an entire string.
This lexer is adapter-backed. It uses the lexer from syntax-color/racket-lexer as its raw engine and adapts that output into the public lexers projected and derived APIs.
When a source starts with "#lang at-exp", the adapter switches to the Scribble lexer family in Racket mode so that @litchar["@"] forms are tokenized as Scribble escapes instead of ordinary symbol text.
procedure
(make-racket-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
> (define lexer (make-racket-lexer #:profile 'coloring))
> (define in (open-input-string "#:x \"hi\"")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'literal "#:x") (position 1 1 0) (position 4 1 3))
(position-token (token 'whitespace " ") (position 4 1 3) (position 5 1 4))
(position-token (token 'literal "\"hi\"") (position 5 1 4) (position 9 1 8)))
procedure
(racket-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-racket-lexer.
22.1 Racket Returned Tokens
Common projected Racket categories include:
'whitespace
'comment
'identifier
'literal
'delimiter
'unknown
'eof
For the current adapter:
comments and sexp comments project as 'comment
whitespace projects as 'whitespace
strings, constants, and hash-colon keywords project as 'literal
symbols, other, and no-color tokens project as 'identifier
parentheses project as 'delimiter
lexical errors project as 'unknown in 'coloring mode and raise in 'compiler mode
Projected and derived Racket token text preserve the exact consumed source slice, including multi-semicolon comment headers such as ;;;.
> (define inspect-lexer (make-racket-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "#;(+ 1 2) #:x")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'comment
> (lexer-token-value first-token) "#;"
procedure
→ (input-port? . -> . (or/c 'eof racket-derived-token?))
procedure
(racket-string->derived-tokens source)
→ (listof racket-derived-token?) source : string?
procedure
v : any/c
procedure
(racket-derived-token-tags token) → (listof symbol?)
token : racket-derived-token?
procedure
(racket-derived-token-has-tag? token tag) → boolean?
token : racket-derived-token? tag : symbol?
procedure
(racket-derived-token-text token) → string?
token : racket-derived-token?
procedure
(racket-derived-token-start token) → position?
token : racket-derived-token?
procedure
(racket-derived-token-end token) → position?
token : racket-derived-token?
22.2 Racket Derived Tokens
The current Racket adapter may attach tags such as:
'racket-comment
'racket-sexp-comment
'racket-whitespace
'racket-constant
'racket-string
'racket-symbol
'racket-parenthesis
'racket-hash-colon-keyword
'racket-commented-out
'racket-datum
'racket-open
'racket-close
'racket-continue
'racket-usual-special-form
'racket-definition-form
'racket-binding-form
'racket-conditional-form
'racket-error
'scribble-text for "#lang at-exp" text regions
'scribble-command-char for @litchar["@"] in "#lang at-exp" sources
'scribble-command for command names such as @litchar["@"]bold in "#lang at-exp" sources
'scribble-body-delimiter
'scribble-optional-delimiter
'scribble-racket-escape
The ‘usual special form‘ tags are heuristic. They are meant to help ordinary Racket tooling recognize common built-in forms such as define, define-values, if, and let, but they are not guarantees about expanded meaning. In particular, a token whose text is "define" may still receive 'racket-usual-special-form even in a program where define has been rebound, because the lexer does not perform expansion or binding resolution.
> (define derived-tokens (racket-string->derived-tokens "#;(+ 1 2) #:x \"hi\""))
> (map (lambda (token) (list (racket-derived-token-text token) (racket-derived-token-tags token))) derived-tokens)
'(("#;" (comment racket-sexp-comment racket-continue))
("(" (delimiter racket-parenthesis racket-open comment racket-commented-out))
("+" (identifier racket-symbol racket-datum comment racket-commented-out))
(" "
(whitespace racket-whitespace racket-continue comment racket-commented-out))
("1" (literal racket-constant racket-datum comment racket-commented-out))
(" "
(whitespace racket-whitespace racket-continue comment racket-commented-out))
("2" (literal racket-constant racket-datum comment racket-commented-out))
(")"
(delimiter racket-parenthesis racket-close comment racket-commented-out))
(" " (whitespace racket-whitespace racket-continue))
("#:x" (literal racket-hash-colon-keyword racket-datum))
(" " (whitespace racket-whitespace racket-continue))
("\"hi\"" (literal racket-string racket-datum)))
> (define at-exp-derived-tokens (racket-string->derived-tokens "#lang at-exp racket\n(define x @bold{hi})\n"))
> (map (lambda (token) (list (racket-derived-token-text token) (racket-derived-token-tags token))) at-exp-derived-tokens)
'(("#lang at-exp" (identifier racket-other racket-datum))
(" " (whitespace racket-whitespace racket-continue))
("racket" (identifier racket-symbol racket-datum))
("\n" (whitespace racket-whitespace racket-continue))
("(" (delimiter racket-parenthesis racket-open))
("define"
(identifier
racket-symbol
racket-datum
racket-usual-special-form
racket-definition-form))
(" " (whitespace racket-whitespace racket-continue))
("x" (identifier racket-symbol racket-datum))
(" " (whitespace racket-whitespace racket-continue))
("@" (delimiter racket-parenthesis racket-datum scribble-command-char))
("bold" (identifier racket-symbol racket-datum scribble-command))
("{" (delimiter racket-parenthesis racket-open scribble-body-delimiter))
("hi" (literal scribble-text racket-continue))
("}" (delimiter racket-parenthesis racket-close scribble-body-delimiter))
(")" (delimiter racket-parenthesis racket-close))
("\n" (whitespace racket-whitespace racket-continue)))
value
racket-profiles : immutable-hash?
23 Rhombus
| (require lexers/rhombus) | package: lexers-lib |
The projected Rhombus API has two entry points:
make-rhombus-lexer for streaming tokenization from an input port.
rhombus-string->tokens for eager tokenization of an entire string.
This lexer is adapter-backed. It uses the lexer from rhombus/private/syntax-color as its raw engine and adapts that output into the public lexers projected and derived APIs.
Rhombus support is optional. When rhombus/private/syntax-color is not available, the module still loads, but calling the Rhombus lexer raises an error explaining that Rhombus support requires rhombus-lib on base >= 8.14.
procedure
(make-rhombus-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
procedure
(rhombus-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
23.1 Rhombus Returned Tokens
Common projected Rhombus categories include:
'whitespace
'comment
'identifier
'keyword
'literal
'operator
'delimiter
'unknown
'eof
For the current adapter:
ordinary whitespace projects as 'whitespace
line comments project as 'comment
Rhombus keywords and builtins project as 'keyword
remaining identifiers project as 'identifier
literals project as 'literal
openers, closers, and separators project as 'delimiter
operators such as +, :, and , project as 'operator
recoverable malformed input projects as 'unknown in 'coloring mode and raises in 'compiler mode
Projected and derived Rhombus token text preserve the exact consumed source slice, including CRLF line endings when Rhombus support is available.
procedure
→ (input-port? . -> . (or/c 'eof rhombus-derived-token?))
procedure
(rhombus-string->derived-tokens source)
→ (listof rhombus-derived-token?) source : string?
procedure
v : any/c
procedure
(rhombus-derived-token-tags token) → (listof symbol?)
token : rhombus-derived-token?
procedure
(rhombus-derived-token-has-tag? token tag) → boolean?
token : rhombus-derived-token? tag : symbol?
procedure
(rhombus-derived-token-text token) → string?
token : rhombus-derived-token?
procedure
(rhombus-derived-token-start token) → position?
token : rhombus-derived-token?
procedure
(rhombus-derived-token-end token) → position?
token : rhombus-derived-token?
23.2 Rhombus Derived Tokens
The current Rhombus adapter may attach tags such as:
'rhombus-comment
'rhombus-whitespace
'rhombus-string
'rhombus-constant
'rhombus-literal
'rhombus-identifier
'rhombus-keyword
'rhombus-builtin
'rhombus-operator
'rhombus-block-operator
'rhombus-comma-operator
'rhombus-opener
'rhombus-closer
'rhombus-parenthesis
'rhombus-separator
'rhombus-at
'rhombus-fail
'rhombus-error
'malformed-token
The adapter preserves Rhombus-specific keyword and builtin guesses from rhombus/private/syntax-color. Since the shared projected stream does not have a separate builtin category, builtins currently project as 'keyword, while the derived-token layer keeps the more specific 'rhombus-builtin tag.
value
rhombus-profiles : immutable-hash?
24 Scribble
| (require lexers/scribble) | package: lexers-lib |
The projected Scribble API has two entry points:
make-scribble-lexer for streaming tokenization from an input port.
scribble-string->tokens for eager tokenization of an entire string.
This lexer is adapter-backed. It uses syntax-color/scribble-lexer as its raw engine and adapts that output into the public lexers projected and derived APIs.
The first implementation defaults to Scribble’s inside/text mode via make-scribble-inside-lexer. Command-character customization is intentionally deferred.
procedure
(make-scribble-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
> (define lexer (make-scribble-lexer #:profile 'coloring))
> (define in (open-input-string "@title{Hi}\nText")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'delimiter "@") (position 1 1 0) (position 2 1 1))
(position-token (token 'identifier "title") (position 2 1 1) (position 7 1 6))
(position-token (token 'delimiter "{") (position 7 1 6) (position 8 1 7))
(position-token (token 'literal "Hi") (position 8 1 7) (position 10 1 9)))
procedure
(scribble-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-scribble-lexer.
24.1 Scribble Returned Tokens
Common projected Scribble categories include:
'whitespace
'comment
'identifier
'literal
'delimiter
'unknown
'eof
For the current adapter:
text, strings, and constants project as 'literal
whitespace projects as 'whitespace
symbol and other tokens project as 'identifier
parentheses, the command character, and body or optional delimiters project as 'delimiter
lexical errors project as 'unknown in 'coloring mode and raise in 'compiler mode
For source fidelity, the Scribble adapter preserves the exact source slice for projected and derived token text, including whitespace spans that contain one or more newlines.
> (define inspect-lexer (make-scribble-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "@title{Hi}")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'delimiter
> (lexer-token-value first-token) "@"
procedure
→ (input-port? . -> . (or/c 'eof scribble-derived-token?))
procedure
(scribble-string->derived-tokens source)
→ (listof scribble-derived-token?) source : string?
procedure
v : any/c
procedure
(scribble-derived-token-tags token) → (listof symbol?)
token : scribble-derived-token?
procedure
(scribble-derived-token-has-tag? token tag) → boolean?
token : scribble-derived-token? tag : symbol?
procedure
(scribble-derived-token-text token) → string?
token : scribble-derived-token?
procedure
(scribble-derived-token-start token) → position?
token : scribble-derived-token?
procedure
(scribble-derived-token-end token) → position?
token : scribble-derived-token?
24.2 Scribble Derived Tokens
The current Scribble adapter may attach tags such as:
'scribble-comment
'scribble-whitespace
'scribble-text
'scribble-string
'scribble-constant
'scribble-symbol
'scribble-parenthesis
'scribble-other
'scribble-error
'scribble-command
'scribble-command-char
'scribble-body-delimiter
'scribble-optional-delimiter
'scribble-racket-escape
These tags describe reusable Scribble structure, not presentation. In particular, 'scribble-command only means that a symbol-like token is being used as a command name after "@". It does not mean the lexer has inferred higher-level document semantics for commands such as title or itemlist.
> (define derived-tokens (scribble-string->derived-tokens "@title{Hi}\n@racket[(define x 1)]"))
> (map (lambda (token) (list (scribble-derived-token-text token) (scribble-derived-token-tags token))) derived-tokens)
'(("@" (delimiter scribble-parenthesis scribble-command-char))
("title" (identifier scribble-symbol scribble-command))
("{" (delimiter scribble-parenthesis scribble-body-delimiter))
("Hi" (literal scribble-text))
("}" (delimiter scribble-parenthesis scribble-body-delimiter))
("\n" (whitespace scribble-whitespace))
("@" (delimiter scribble-parenthesis scribble-command-char))
("racket" (identifier scribble-symbol scribble-command))
("[" (delimiter scribble-parenthesis scribble-optional-delimiter))
("(" (delimiter scribble-parenthesis scribble-racket-escape))
("define" (scribble-racket-escape))
(" " (whitespace scribble-whitespace scribble-racket-escape))
("x" (identifier scribble-symbol scribble-racket-escape))
(" " (whitespace scribble-whitespace scribble-racket-escape))
("1" (literal scribble-constant scribble-racket-escape))
(")" (delimiter scribble-parenthesis scribble-racket-escape))
("]"
(delimiter
scribble-parenthesis
scribble-optional-delimiter
scribble-racket-escape)))
value
scribble-profiles : immutable-hash?
25 JavaScript
| (require lexers/javascript) | package: lexers-lib |
The projected JavaScript API has two entry points:
make-javascript-lexer for streaming tokenization from an input port.
javascript-string->tokens for eager tokenization of an entire string.
procedure
(make-javascript-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions #:jsx? jsx?]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default jsx? : boolean? = #f
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token? whose payload is either a bare symbol such as 'eof or a token? carrying a projected category such as 'keyword, 'identifier, 'literal, 'operator, 'comment, or 'unknown.
When #:source-positions is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
When #:jsx? is true, the lexer accepts a small JSX extension inside JavaScript expressions. The projected token categories remain the same, while the derived-token API exposes JSX-specific structure.
> (define lexer (make-javascript-lexer #:profile 'coloring))
> (define in (open-input-string "const x = 1;")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'keyword "const") (position 1 1 0) (position 6 1 5))
(position-token (token 'whitespace " ") (position 6 1 5) (position 7 1 6))
(position-token (token 'identifier "x") (position 7 1 6) (position 8 1 7))
(position-token (token 'whitespace " ") (position 8 1 7) (position 9 1 8)))
procedure
(javascript-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions #:jsx? jsx?]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default jsx? : boolean? = #f
This is a convenience wrapper over make-javascript-lexer. It opens a string port, enables line counting, repeatedly calls the port-based lexer until end-of-file, and returns the resulting token list.
25.1 JavaScript Returned Tokens
The projected JavaScript API uses the same output shape:
The end of input is reported as 'eof, either directly or inside a position-token?.
Ordinary results are usually token? values whose token-name is a projected category and whose token-value contains language-specific text or metadata.
When #:source-positions is true, each result is wrapped in a position-token?.
When #:source-positions is false, results are returned without that outer wrapper.
Common projected JavaScript categories include:
'whitespace
'comment
'keyword
'identifier
'literal
'operator
'delimiter
'unknown
'eof
In 'coloring mode, whitespace and comments are kept, and recoverable malformed input is returned as 'unknown. In 'compiler mode, whitespace and comments are skipped by default, and malformed input raises an exception instead of producing an 'unknown token.
For the current JavaScript scaffold, token-value also preserves the original source text of the emitted token. In particular:
For 'keyword and 'identifier, the value is the matched identifier text, such as "const" or "name".
For 'literal, the value is the matched literal text, such as "1" or "\"hello\"".
For 'comment and 'whitespace, the value is the original comment or whitespace text when those categories are kept.
For 'operator and 'delimiter, the value is the matched character text, such as "=", ";", or "(".
For 'unknown in tolerant mode, the value is the malformed input text that could not be accepted.
> (define inspect-lexer (make-javascript-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "const x = 1;")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'keyword
> (lexer-token-value first-token) "const"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 6
procedure
(make-javascript-derived-lexer [#:jsx? jsx?])
→ (input-port? . -> . (or/c 'eof javascript-derived-token?)) jsx? : boolean? = #f
The result is a procedure of one argument, an input port. Each call reads the next raw JavaScript token from the port, computes its JavaScript-specific derived classifications, and returns one derived token value. At end of input, it returns 'eof.
The intended use is the same as for make-javascript-lexer: create the lexer once, then call it repeatedly on the same port until it returns 'eof.
> (define derived-lexer (make-javascript-derived-lexer))
> (define derived-in (open-input-string "const x = 1;")) > (port-count-lines! derived-in)
> (list (derived-lexer derived-in) (derived-lexer derived-in) (derived-lexer derived-in) (derived-lexer derived-in))
(list
(javascript-derived-token
(javascript-raw-token
'identifier-token
"const"
(position 1 1 0)
(position 6 1 5))
'(keyword))
(javascript-derived-token
(javascript-raw-token
'whitespace-token
" "
(position 6 1 5)
(position 7 1 6))
'())
(javascript-derived-token
(javascript-raw-token
'identifier-token
"x"
(position 7 1 6)
(position 8 1 7))
'(identifier declaration-name))
(javascript-derived-token
(javascript-raw-token
'whitespace-token
" "
(position 8 1 7)
(position 9 1 8))
'()))
procedure
(javascript-string->derived-tokens source [ #:jsx? jsx?]) → (listof javascript-derived-token?) source : string? jsx? : boolean? = #f
This is a convenience wrapper over make-javascript-derived-lexer. It opens a string port, enables line counting, repeatedly calls the derived lexer until it returns 'eof, and returns the resulting list of derived tokens.
procedure
v : any/c
procedure
(javascript-derived-token-tags token) → (listof symbol?)
token : javascript-derived-token?
procedure
(javascript-derived-token-has-tag? token tag) → boolean? token : javascript-derived-token? tag : symbol?
procedure
(javascript-derived-token-text token) → string?
token : javascript-derived-token?
procedure
(javascript-derived-token-start token) → position?
token : javascript-derived-token?
procedure
(javascript-derived-token-end token) → position?
token : javascript-derived-token?
25.2 JavaScript Derived Tokens
A derived JavaScript token pairs one raw JavaScript token with a small list of JavaScript-specific classification tags. This layer is more precise than the projected consumer-facing categories and is meant for inspection, testing, and language-aware tools.
The current JavaScript scaffold may attach tags such as:
'keyword
'identifier
'declaration-name
'parameter-name
'object-key
'property-name
'method-name
'private-name
'static-keyword-usage
'string-literal
'numeric-literal
'regex-literal
'template-literal
'template-chunk
'template-interpolation-boundary
'jsx-tag-name
'jsx-closing-tag-name
'jsx-attribute-name
'jsx-text
'jsx-interpolation-boundary
'jsx-fragment-boundary
'comment
'malformed-token
> (define derived-tokens (javascript-string->derived-tokens "class Box { static create() { return this.value; } #secret = 1; }\nfunction wrap(name) { return name; }\nconst item = obj.run();\nconst data = { answer: 42 };\nconst greeting = `a ${name} b`;\nreturn /ab+c/i;"))
> (map (lambda (token) (list (javascript-derived-token-text token) (javascript-derived-token-tags token) (javascript-derived-token-has-tag? token 'keyword) (javascript-derived-token-has-tag? token 'identifier) (javascript-derived-token-has-tag? token 'declaration-name) (javascript-derived-token-has-tag? token 'parameter-name) (javascript-derived-token-has-tag? token 'object-key) (javascript-derived-token-has-tag? token 'property-name) (javascript-derived-token-has-tag? token 'method-name) (javascript-derived-token-has-tag? token 'private-name) (javascript-derived-token-has-tag? token 'static-keyword-usage) (javascript-derived-token-has-tag? token 'numeric-literal) (javascript-derived-token-has-tag? token 'regex-literal) (javascript-derived-token-has-tag? token 'template-literal) (javascript-derived-token-has-tag? token 'template-chunk) (javascript-derived-token-has-tag? token 'template-interpolation-boundary))) derived-tokens)
'(("class" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("Box"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("static"
(keyword static-keyword-usage)
(keyword static-keyword-usage)
#f
#f
#f
#f
#f
#f
#f
(static-keyword-usage)
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("create" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)
("(" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(")" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("return" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("this" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
("." () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("value"
(identifier property-name)
#f
(identifier property-name)
#f
#f
#f
(property-name)
#f
#f
#f
#f
#f
#f
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("#secret"
(private-name)
#f
#f
#f
#f
#f
#f
#f
(private-name)
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("1"
(numeric-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
(numeric-literal)
#f
#f
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("function" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("wrap"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
("(" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("name"
(identifier parameter-name)
#f
(identifier parameter-name)
#f
(parameter-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(")" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("return" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("name" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("const" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("item"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("obj" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)
("." () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("run"
(identifier method-name property-name)
#f
(identifier method-name property-name)
#f
#f
#f
(property-name)
(method-name property-name)
#f
#f
#f
#f
#f
#f
#f)
("(" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(")" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("const" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("data"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("answer"
(identifier object-key)
#f
(identifier object-key)
#f
#f
(object-key)
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(":" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("42"
(numeric-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
(numeric-literal)
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("const" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("greeting"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("`"
(template-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal)
#f
#f)
("a "
(template-literal template-chunk)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal template-chunk)
(template-chunk)
#f)
("${"
(template-literal template-interpolation-boundary)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal template-interpolation-boundary)
#f
(template-interpolation-boundary))
("name" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)
("}"
(template-literal template-interpolation-boundary)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal template-interpolation-boundary)
#f
(template-interpolation-boundary))
(" b"
(template-literal template-chunk)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal template-chunk)
(template-chunk)
#f)
("`"
(template-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal)
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("return" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("/ab+c/i"
(regex-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(regex-literal)
#f
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f))
> (define jsx-derived-tokens (javascript-string->derived-tokens "const el = <Button kind=\"primary\">Hello {name}</Button>;\nconst frag = <>ok</>;" #:jsx? #t))
> (map (lambda (token) (list (javascript-derived-token-text token) (javascript-derived-token-tags token) (javascript-derived-token-has-tag? token 'jsx-tag-name) (javascript-derived-token-has-tag? token 'jsx-closing-tag-name) (javascript-derived-token-has-tag? token 'jsx-attribute-name) (javascript-derived-token-has-tag? token 'jsx-text) (javascript-derived-token-has-tag? token 'jsx-interpolation-boundary) (javascript-derived-token-has-tag? token 'jsx-fragment-boundary))) jsx-derived-tokens)
'(("const" (keyword) #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("el" (identifier declaration-name) #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("<" () #f #f #f #f #f #f)
("Button" (identifier jsx-tag-name) (jsx-tag-name) #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("kind" (identifier jsx-attribute-name) #f #f (jsx-attribute-name) #f #f #f)
("=" () #f #f #f #f #f #f)
("\"primary\"" (string-literal) #f #f #f #f #f #f)
(">" () #f #f #f #f #f #f)
("Hello " (jsx-text) #f #f #f (jsx-text) #f #f)
("{"
(jsx-interpolation-boundary)
#f
#f
#f
#f
(jsx-interpolation-boundary)
#f)
("name" (identifier) #f #f #f #f #f #f)
("}"
(jsx-interpolation-boundary)
#f
#f
#f
#f
(jsx-interpolation-boundary)
#f)
("</" () #f #f #f #f #f #f)
("Button"
(identifier jsx-closing-tag-name)
#f
(jsx-closing-tag-name)
#f
#f
#f
#f)
(">" () #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f)
("const" (keyword) #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("frag" (identifier declaration-name) #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("<>" (jsx-fragment-boundary) #f #f #f #f #f (jsx-fragment-boundary))
("ok" (jsx-text) #f #f #f (jsx-text) #f #f)
("</>" (jsx-fragment-boundary) #f #f #f #f #f (jsx-fragment-boundary))
(";" () #f #f #f #f #f #f))
value
javascript-profiles : immutable-hash?