Lexers
1 Overview
1.1 Token Helpers
lexer-token-name
lexer-token-value
lexer-token-has-positions?
lexer-token-start
lexer-token-end
lexer-token-eof?
1.2 Profiles
2 CSS
make-css-lexer
css-string->tokens
2.1 CSS Returned Tokens
make-css-raw-lexer
css-string->raw-tokens
css-raw-token?
css-raw-token-kind
css-raw-token-text
css-raw-token-start
css-raw-token-end
make-css-derived-lexer
css-string->derived-tokens
css-derived-token?
css-derived-token-raw
css-derived-token-raw-kind
css-derived-token-tags
css-derived-token-has-tag?
css-derived-token-text
css-derived-token-start
css-derived-token-end
2.2 CSS Derived Tokens
css-profiles
3 HTML
make-html-lexer
html-string->tokens
3.1 HTML Returned Tokens
make-html-derived-lexer
html-string->derived-tokens
html-derived-token?
html-derived-token-tags
html-derived-token-has-tag?
html-derived-token-text
html-derived-token-start
html-derived-token-end
3.2 HTML Derived Tokens
html-profiles
4 C
make-c-lexer
c-string->tokens
make-c-derived-lexer
c-string->derived-tokens
c-derived-token?
c-derived-token-tags
c-derived-token-has-tag?
c-derived-token-text
c-derived-token-start
c-derived-token-end
c-profiles
5 C+  +
make-cpp-lexer
cpp-string->tokens
make-cpp-derived-lexer
cpp-string->derived-tokens
cpp-derived-token?
cpp-derived-token-tags
cpp-derived-token-has-tag?
cpp-derived-token-text
cpp-derived-token-start
cpp-derived-token-end
cpp-profiles
6 CSV
make-csv-lexer
csv-string->tokens
make-csv-derived-lexer
csv-string->derived-tokens
csv-derived-token?
csv-derived-token-tags
csv-derived-token-has-tag?
csv-derived-token-text
csv-derived-token-start
csv-derived-token-end
csv-profiles
7 JSON
make-json-lexer
json-string->tokens
make-json-derived-lexer
json-string->derived-tokens
json-derived-token?
json-derived-token-tags
json-derived-token-has-tag?
json-derived-token-text
json-derived-token-start
json-derived-token-end
8 Makefile
make-makefile-lexer
makefile-string->tokens
make-makefile-derived-lexer
makefile-string->derived-tokens
makefile-derived-token?
makefile-derived-token-tags
makefile-derived-token-has-tag?
makefile-derived-token-text
makefile-derived-token-start
makefile-derived-token-end
9 Plist
make-plist-lexer
plist-string->tokens
make-plist-derived-lexer
plist-string->derived-tokens
plist-derived-token?
plist-derived-token-tags
plist-derived-token-has-tag?
plist-derived-token-text
plist-derived-token-start
plist-derived-token-end
10 YAML
make-yaml-lexer
yaml-string->tokens
make-yaml-derived-lexer
yaml-string->derived-tokens
yaml-derived-token?
yaml-derived-token-tags
yaml-derived-token-has-tag?
yaml-derived-token-text
yaml-derived-token-start
yaml-derived-token-end
yaml-profiles
11 Markdown
make-markdown-lexer
markdown-string->tokens
11.1 Markdown Returned Tokens
make-markdown-derived-lexer
markdown-string->derived-tokens
markdown-derived-token?
markdown-derived-token-tags
markdown-derived-token-has-tag?
markdown-derived-token-text
markdown-derived-token-start
markdown-derived-token-end
11.2 Markdown Derived Tokens
markdown-profiles
12 Go
make-go-lexer
go-string->tokens
make-go-derived-lexer
go-string->derived-tokens
go-derived-token?
go-derived-token-tags
go-derived-token-has-tag?
go-derived-token-text
go-derived-token-start
go-derived-token-end
go-profiles
13 Java
make-java-lexer
java-string->tokens
make-java-derived-lexer
java-string->derived-tokens
java-derived-token?
java-derived-token-tags
java-derived-token-has-tag?
java-derived-token-text
java-derived-token-start
java-derived-token-end
java-profiles
14 Haskell
make-haskell-lexer
haskell-string->tokens
make-haskell-derived-lexer
haskell-string->derived-tokens
haskell-derived-token?
haskell-derived-token-tags
haskell-derived-token-has-tag?
haskell-derived-token-text
haskell-derived-token-start
haskell-derived-token-end
haskell-profiles
15 Objective-C
make-objc-lexer
objc-string->tokens
make-objc-derived-lexer
objc-string->derived-tokens
objc-derived-token?
objc-derived-token-tags
objc-derived-token-has-tag?
objc-derived-token-text
objc-derived-token-start
objc-derived-token-end
objc-profiles
16 Pascal
make-pascal-lexer
pascal-string->tokens
make-pascal-derived-lexer
pascal-string->derived-tokens
pascal-derived-token?
pascal-derived-token-tags
pascal-derived-token-has-tag?
pascal-derived-token-text
pascal-derived-token-start
pascal-derived-token-end
pascal-profiles
17 Python
make-python-lexer
python-string->tokens
make-python-derived-lexer
python-string->derived-tokens
python-derived-token?
python-derived-token-tags
python-derived-token-has-tag?
python-derived-token-text
python-derived-token-start
python-derived-token-end
18 Shell
make-shell-lexer
shell-string->tokens
18.1 Shell Returned Tokens
make-shell-derived-lexer
shell-string->derived-tokens
shell-derived-token?
shell-derived-token-tags
shell-derived-token-has-tag?
shell-derived-token-text
shell-derived-token-start
shell-derived-token-end
18.2 Shell Derived Tokens
shell-profiles
19 Rust
make-rust-lexer
rust-string->tokens
make-rust-derived-lexer
rust-string->derived-tokens
rust-derived-token?
rust-derived-token-tags
rust-derived-token-has-tag?
rust-derived-token-text
rust-derived-token-start
rust-derived-token-end
rust-profiles
20 Swift
make-swift-lexer
swift-string->tokens
make-swift-derived-lexer
swift-string->derived-tokens
swift-derived-token?
swift-derived-token-tags
swift-derived-token-has-tag?
swift-derived-token-text
swift-derived-token-start
swift-derived-token-end
21 Te  X
make-tex-lexer
tex-string->tokens
make-tex-derived-lexer
tex-string->derived-tokens
tex-derived-token?
tex-derived-token-tags
tex-derived-token-has-tag?
tex-derived-token-text
tex-derived-token-start
tex-derived-token-end
tex-profiles
22 La  Te  X
make-latex-lexer
latex-string->tokens
make-latex-derived-lexer
latex-string->derived-tokens
latex-derived-token?
latex-derived-token-tags
latex-derived-token-has-tag?
latex-derived-token-text
latex-derived-token-start
latex-derived-token-end
latex-profiles
23 TSV
make-tsv-lexer
tsv-string->tokens
make-tsv-derived-lexer
tsv-string->derived-tokens
tsv-derived-token?
tsv-derived-token-tags
tsv-derived-token-has-tag?
tsv-derived-token-text
tsv-derived-token-start
tsv-derived-token-end
tsv-profiles
24 WAT
make-wat-lexer
wat-string->tokens
24.1 WAT Returned Tokens
make-wat-derived-lexer
wat-string->derived-tokens
wat-derived-token?
wat-derived-token-tags
wat-derived-token-has-tag?
wat-derived-token-text
wat-derived-token-start
wat-derived-token-end
24.2 WAT Derived Tokens
wat-profiles
25 Racket
make-racket-lexer
racket-string->tokens
25.1 Racket Returned Tokens
make-racket-derived-lexer
racket-string->derived-tokens
racket-derived-token?
racket-derived-token-tags
racket-derived-token-has-tag?
racket-derived-token-text
racket-derived-token-start
racket-derived-token-end
25.2 Racket Derived Tokens
racket-profiles
26 Rhombus
make-rhombus-lexer
rhombus-string->tokens
26.1 Rhombus Returned Tokens
make-rhombus-derived-lexer
rhombus-string->derived-tokens
rhombus-derived-token?
rhombus-derived-token-tags
rhombus-derived-token-has-tag?
rhombus-derived-token-text
rhombus-derived-token-start
rhombus-derived-token-end
26.2 Rhombus Derived Tokens
rhombus-profiles
27 Scribble
make-scribble-lexer
scribble-string->tokens
27.1 Scribble Returned Tokens
make-scribble-derived-lexer
scribble-string->derived-tokens
scribble-derived-token?
scribble-derived-token-tags
scribble-derived-token-has-tag?
scribble-derived-token-text
scribble-derived-token-start
scribble-derived-token-end
27.2 Scribble Derived Tokens
scribble-profiles
28 Java  Script
make-javascript-lexer
javascript-string->tokens
28.1 Java  Script Returned Tokens
make-javascript-derived-lexer
javascript-string->derived-tokens
javascript-derived-token?
javascript-derived-token-tags
javascript-derived-token-has-tag?
javascript-derived-token-text
javascript-derived-token-start
javascript-derived-token-end
28.2 Java  Script Derived Tokens
javascript-profiles
9.2.0.2

Lexers🔗ℹ

Jens Axel Søgaard <jensaxel@soegaard.net>

This manual documents the public APIs in the lexers packages.

The library currently provides reusable lexers for multiple applications. Syntax coloring is the first intended application, but the lexer APIs are also designed to support other consumers.

    1 Overview

      1.1 Token Helpers

      1.2 Profiles

    2 CSS

      2.1 CSS Returned Tokens

      2.2 CSS Derived Tokens

    3 HTML

      3.1 HTML Returned Tokens

      3.2 HTML Derived Tokens

    4 C

    5 C++

    6 CSV

    7 JSON

    8 Makefile

    9 Plist

    10 YAML

    11 Markdown

      11.1 Markdown Returned Tokens

      11.2 Markdown Derived Tokens

    12 Go

    13 Java

    14 Haskell

    15 Objective-C

    16 Pascal

    17 Python

    18 Shell

      18.1 Shell Returned Tokens

      18.2 Shell Derived Tokens

    19 Rust

    20 Swift

    21 TeX

    22 LaTeX

    23 TSV

    24 WAT

      24.1 WAT Returned Tokens

      24.2 WAT Derived Tokens

    25 Racket

      25.1 Racket Returned Tokens

      25.2 Racket Derived Tokens

    26 Rhombus

      26.1 Rhombus Returned Tokens

      26.2 Rhombus Derived Tokens

    27 Scribble

      27.1 Scribble Returned Tokens

      27.2 Scribble Derived Tokens

    28 JavaScript

      28.1 JavaScript Returned Tokens

      28.2 JavaScript Derived Tokens

1 Overview🔗ℹ

The public language modules currently available are:

Each language module currently exposes two related kinds of API:

  • A projected token API intended for general consumers such as syntax coloring.

  • A derived-token API intended for richer language-specific inspection and testing.

The projected APIs are intentionally close to parser-tools/lex. They return bare symbols, token? values, and optional position-token? wrappers built from the actual parser-tools/lex structures, so existing parser-oriented tools can consume them more easily.

The current profile split is:

  • 'coloring keeps trivia, emits 'unknown for recoverable malformed input, and includes source positions by default.

  • 'compiler skips trivia by default, raises on malformed input, and includes source positions by default.

Across languages, the projected lexer constructors return one-argument port readers. Create the lexer once, call it repeatedly on the same input port, and stop when the result is an end-of-file token. The projected category symbols themselves, such as 'identifier, 'literal, and 'keyword, are intended to be the stable public API.

1.1 Token Helpers🔗ℹ

The helper module lexers/token provides a small public API for inspecting wrapped or unwrapped projected token values without reaching directly into parser-tools/lex.

 (require lexers/token) package: lexers-lib

procedure

(lexer-token-name token)  symbol?

  token : (or/c symbol? token? position-token?)
Extracts the effective token category from a wrapped or unwrapped projected token value.

procedure

(lexer-token-value token)  any/c

  token : (or/c symbol? token? position-token?)
Extracts the effective token payload from a wrapped or unwrapped projected token value. For the bare end-of-file symbol, the result is #f.

procedure

(lexer-token-has-positions? token)  boolean?

  token : (or/c symbol? token? position-token?)
Determines whether a wrapped or unwrapped projected token value carries source positions.

procedure

(lexer-token-start token)  (or/c position? #f)

  token : (or/c symbol? token? position-token?)
Extracts the starting position from a wrapped projected token value. For unwrapped values, the result is #f.

procedure

(lexer-token-end token)  (or/c position? #f)

  token : (or/c symbol? token? position-token?)
Extracts the ending position from a wrapped projected token value. For unwrapped values, the result is #f.

procedure

(lexer-token-eof? token)  boolean?

  token : (or/c symbol? token? position-token?)
Determines whether a wrapped or unwrapped projected token value represents end of input.

1.2 Profiles🔗ℹ

The public projected APIs currently support the same profile names:

  • 'coloring

  • 'compiler

The current defaults are:

Profile

  

Trivia

  

Source Positions

  

Malformed Input

'coloring

  

'keep

  

#t

  

emit unknown tokens

'compiler

  

'skip

  

#t

  

raise an exception

For the keyword arguments accepted by make-css-lexer, css-string->tokens, make-html-lexer, html-string->tokens, make-json-lexer, json-string->tokens, make-javascript-lexer, javascript-string->tokens, make-markdown-lexer, markdown-string->tokens, make-objc-lexer, objc-string->tokens, make-python-lexer, python-string->tokens, make-racket-lexer, racket-string->tokens, make-rhombus-lexer, rhombus-string->tokens, make-shell-lexer, make-cpp-lexer, cpp-string->tokens, shell-string->tokens, make-scribble-lexer, scribble-string->tokens, make-swift-lexer, swift-string->tokens, make-wat-lexer, and wat-string->tokens:

  • #:profile selects the named default bundle.

  • #:trivia 'profile-default means “use the trivia policy from the selected profile”.

  • #:source-positions 'profile-default means “use the source-position setting from the selected profile”.

  • An explicit #:trivia or #:source-positions value overrides the selected profile default.

2 CSS🔗ℹ

 (require lexers/css) package: lexers-lib

The projected CSS API has two entry points:

The CSS module also exposes a raw-token API for parser-oriented consumers:

procedure

(make-css-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming CSS lexer.

The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.

When #:source-positions is true, each result is a position-token? whose payload is either a bare symbol such as 'eof or a token? carrying a projected category such as 'identifier, 'literal, 'comment, or 'unknown.

When #:source-positions is false, the result is either a bare symbol or a token? directly.

The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.

Examples:
> (define lexer
    (make-css-lexer #:profile 'coloring))
> (define in
    (open-input-string "color: #fff;"))
> (port-count-lines! in)
> (list (lexer in)
        (lexer in)
        (lexer in)
        (lexer in))

(list

 (position-token (token 'identifier "color") (position 1 1 0) (position 6 1 5))

 (position-token (token 'delimiter ":") (position 6 1 5) (position 7 1 6))

 (position-token (token 'whitespace " ") (position 7 1 6) (position 8 1 7))

 (position-token (token 'literal "#fff") (position 8 1 7) (position 12 1 11)))

procedure

(css-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes an entire CSS string using the projected token API.

This is a convenience wrapper over make-css-lexer. It opens a string port, enables line counting, repeatedly calls the port-based lexer until end-of-file, and returns the resulting token list.

2.1 CSS Returned Tokens🔗ℹ

The projected CSS API returns values in the same general shape as parser-tools/lex:

  • The end of input is reported as 'eof, either directly or inside a position-token?.

  • Most ordinary results are token? values whose token-name is a projected category and whose token-value contains language-specific text or metadata.

  • When #:source-positions is true, each result is wrapped in a position-token?.

  • When #:source-positions is false, results are returned without that outer wrapper.

Common projected CSS categories include:

  • 'whitespace

  • 'comment

  • 'identifier

  • 'literal

  • 'delimiter

  • 'unknown

  • 'eof

In 'coloring mode, whitespace and comments are kept, and recoverable malformed input is returned as 'unknown. In 'compiler mode, whitespace and comments are skipped by default, and malformed input raises an exception instead of producing an 'unknown token.

For the current CSS scaffold, token-value normally preserves the original source text of the emitted token. In particular:

  • For 'identifier, the value is the matched identifier text, such as "color" or "--brand-color".

  • For 'literal, the value is the matched literal text, such as "#fff", "12px", "url(foo.png)", or "rgb(".

  • For 'comment and 'whitespace, the value is the original comment or whitespace text when those categories are kept.

  • For 'delimiter, the value is the matched delimiter text, such as ":", ";", or "{".

  • For 'unknown in tolerant mode, the value is the malformed input text that could not be accepted.

Examples:
> (define inspect-lexer
    (make-css-lexer #:profile 'coloring))
> (define inspect-in
    (open-input-string "color: #fff;"))
> (port-count-lines! inspect-in)
> (define first-token
    (inspect-lexer inspect-in))
> (lexer-token-has-positions? first-token)

#t

> (lexer-token-name first-token)

'identifier

> (lexer-token-value first-token)

"color"

> (position-offset (lexer-token-start first-token))

1

> (position-offset (lexer-token-end first-token))

6

}

procedure

(make-css-raw-lexer)

  (input-port? . -> . (or/c 'eof css-raw-token?))
Constructs a streaming CSS lexer for the raw-token layer.

The result is a procedure of one argument, an input port. Each call reads the next raw CSS token from the port and returns either one css-raw-token? value or 'eof.

This layer stays close to CSS Syntax Level 3 token categories and is intended for parser consumers that need stable raw token kinds instead of the broader projected categories.

Examples:
> (define raw-lexer
    (make-css-raw-lexer))
> (define raw-in
    (open-input-string "@media color 12px"))
> (port-count-lines! raw-in)
> (list (css-raw-token-kind (raw-lexer raw-in))
        (css-raw-token-kind (raw-lexer raw-in))
        (css-raw-token-kind (raw-lexer raw-in)))

'(at-keyword-token whitespace-token ident-token)

procedure

(css-string->raw-tokens source)  (listof css-raw-token?)

  source : string?
Tokenizes an entire CSS string into raw CSS token values.

This is a convenience wrapper over make-css-raw-lexer. It opens a string port, enables line counting, repeatedly calls the raw lexer until it returns 'eof, and returns the resulting raw token list.

procedure

(css-raw-token? v)  boolean?

  v : any/c
Recognizes raw CSS token values returned by make-css-raw-lexer and css-string->raw-tokens.

procedure

(css-raw-token-kind token)  symbol?

  token : css-raw-token?
Returns the CSS Syntax-oriented raw token kind for token, such as 'ident-token, 'function-token, 'at-keyword-token, 'number-token, or 'dimension-token.

procedure

(css-raw-token-text token)  string?

  token : css-raw-token?
Returns the exact source text corresponding to a raw CSS token.

procedure

(css-raw-token-start token)  position?

  token : css-raw-token?
Returns the starting source position for a raw CSS token.

procedure

(css-raw-token-end token)  position?

  token : css-raw-token?
Returns the ending source position for a raw CSS token.

procedure

(make-css-derived-lexer)

  (input-port? . -> . (or/c 'eof css-derived-token?))
Constructs a streaming CSS lexer for the derived-token layer.

The result is a procedure of one argument, an input port. Each call reads the next raw CSS token from the port, computes its CSS-specific derived classifications, and returns one derived token value. At end of input, it returns 'eof.

The intended use is the same as for make-css-lexer: create the lexer once, then call it repeatedly on the same port until it returns 'eof.

Examples:
> (define derived-lexer
    (make-css-derived-lexer))
> (define derived-in
    (open-input-string "color: #fff;"))
> (port-count-lines! derived-in)
> (list (derived-lexer derived-in)
        (derived-lexer derived-in)
        (derived-lexer derived-in)
        (derived-lexer derived-in))

(list

 (css-derived-token

  (css-raw-token 'ident-token "color" (position 1 1 0) (position 6 1 5))

  '(property-name-candidate selector-token))

 (css-derived-token

  (css-raw-token 'colon-token ":" (position 6 1 5) (position 7 1 6))

  '())

 (css-derived-token

  (css-raw-token 'whitespace-token " " (position 7 1 6) (position 8 1 7))

  '())

 (css-derived-token

  (css-raw-token 'hash-token "#fff" (position 8 1 7) (position 12 1 11))

  '(color-literal selector-token)))

procedure

(css-string->derived-tokens source)

  (listof css-derived-token?)
  source : string?
Tokenizes an entire CSS string into derived CSS token values.

This is a convenience wrapper over make-css-derived-lexer. It opens a string port, enables line counting, repeatedly calls the derived lexer until it returns 'eof, and returns the resulting list of derived tokens.

procedure

(css-derived-token? v)  boolean?

  v : any/c
Recognizes derived CSS token values returned by make-css-derived-lexer and css-string->derived-tokens.

procedure

(css-derived-token-raw token)  css-raw-token?

  token : css-derived-token?
Returns the raw CSS token wrapped by a derived CSS token.

procedure

(css-derived-token-raw-kind token)  symbol?

  token : css-derived-token?
Returns the CSS Syntax-oriented raw token kind wrapped by a derived CSS token.

procedure

(css-derived-token-tags token)  (listof symbol?)

  token : css-derived-token?
Returns the CSS-specific classification tags attached to a derived CSS token.

procedure

(css-derived-token-has-tag? token tag)  boolean?

  token : css-derived-token?
  tag : symbol?
Determines whether a derived CSS token carries a given classification tag.

procedure

(css-derived-token-text token)  string?

  token : css-derived-token?
Returns the exact source text corresponding to a derived CSS token.

procedure

(css-derived-token-start token)  position?

  token : css-derived-token?
Returns the starting source position for a derived CSS token.

procedure

(css-derived-token-end token)  position?

  token : css-derived-token?
Returns the ending source position for a derived CSS token.

2.2 CSS Derived Tokens🔗ℹ

A derived CSS token pairs one raw CSS token with a small list of CSS-specific classification tags. This layer is more precise than the projected consumer-facing categories and is meant for inspection, testing, and language-aware tools.

The current CSS scaffold may attach tags such as:

  • 'at-rule-name

  • 'color-literal

  • 'color-function

  • 'selector-token

  • 'property-name

  • 'declaration-value-token

  • 'function-name

  • 'gradient-function

  • 'custom-property-name

  • 'property-name-candidate

  • 'string-literal

  • 'numeric-literal

  • 'length-dimension

  • 'malformed-token

Examples:
> (define derived-tokens
    (css-string->derived-tokens ".foo { color: red; background: rgb(1 2 3); }"))
> (map (lambda (token)
         (list (css-derived-token-text token)
               (css-derived-token-tags token)
               (css-derived-token-has-tag? token 'selector-token)
               (css-derived-token-has-tag? token 'property-name)
               (css-derived-token-has-tag? token 'declaration-value-token)
               (css-derived-token-has-tag? token 'color-literal)
               (css-derived-token-has-tag? token 'function-name)
               (css-derived-token-has-tag? token 'color-function)
               (css-derived-token-has-tag? token 'custom-property-name)
               (css-derived-token-has-tag? token 'string-literal)
               (css-derived-token-has-tag? token 'numeric-literal)
               (css-derived-token-has-tag? token 'length-dimension)))
       derived-tokens)

'(("." () #f #f #f #f #f #f #f #f #f #f)

  ("foo"

   (property-name-candidate selector-token)

   (selector-token)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f)

  ("{" () #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f)

  ("color"

   (property-name-candidate property-name)

   #f

   (property-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (":" () #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f)

  ("red"

   (property-name-candidate declaration-value-token)

   #f

   #f

   (declaration-value-token)

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (";" () #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f)

  ("background"

   (property-name-candidate property-name)

   #f

   (property-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (":" () #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f)

  ("rgb"

   (function-name color-function declaration-value-token)

   #f

   #f

   (declaration-value-token)

   #f

   (function-name color-function declaration-value-token)

   (color-function declaration-value-token)

   #f

   #f

   #f

   #f)

  ("(" () #f #f #f #f #f #f #f #f #f #f)

  ("1"

   (numeric-literal declaration-value-token)

   #f

   #f

   (declaration-value-token)

   #f

   #f

   #f

   #f

   #f

   (numeric-literal declaration-value-token)

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f)

  ("2"

   (numeric-literal declaration-value-token)

   #f

   #f

   (declaration-value-token)

   #f

   #f

   #f

   #f

   #f

   (numeric-literal declaration-value-token)

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f)

  ("3"

   (numeric-literal declaration-value-token)

   #f

   #f

   (declaration-value-token)

   #f

   #f

   #f

   #f

   #f

   (numeric-literal declaration-value-token)

   #f)

  (")" () #f #f #f #f #f #f #f #f #f #f)

  (";" () #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f)

  ("}" () #f #f #f #f #f #f #f #f #f #f))

}

value

css-profiles : immutable-hash?

The profile defaults used by the CSS lexer.

3 HTML🔗ℹ

 (require lexers/html) package: lexers-lib

The projected HTML API has two entry points:

procedure

(make-html-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming HTML lexer.

The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.

The projected HTML token stream includes ordinary markup tokens and inline delegated tokens from embedded <style> and <script> bodies.

When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.

Examples:
> (define lexer
    (make-html-lexer #:profile 'coloring))
> (define in
    (open-input-string "<section id=main>Hi</section>"))
> (port-count-lines! in)
> (list (lexer in)
        (lexer in)
        (lexer in)
        (lexer in))

(list

 (position-token (token 'delimiter "<") (position 1 1 0) (position 2 1 1))

 (position-token

  (token 'identifier "section")

  (position 2 1 1)

  (position 9 1 8))

 (position-token (token 'whitespace " ") (position 9 1 8) (position 10 1 9))

 (position-token

  (token 'identifier "id")

  (position 10 1 9)

  (position 12 1 11)))

procedure

(html-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes an entire HTML string using the projected token API.

This is a convenience wrapper over make-html-lexer.

3.1 HTML Returned Tokens🔗ℹ

Common projected HTML categories include:

  • 'comment

  • 'keyword

  • 'identifier

  • 'literal

  • 'operator

  • 'delimiter

  • 'unknown

  • 'eof

For the current HTML scaffold:

  • tag names and attribute names project as 'identifier

  • attribute values, text nodes, entities, and delegated CSS/JS literals project as 'literal

  • punctuation such as <, </, >, />, and embedded interpolation boundaries project as 'delimiter or 'operator

  • comments project as 'comment

  • doctype/declaration markup projects as 'keyword

Examples:
> (define inspect-lexer
    (make-html-lexer #:profile 'coloring))
> (define inspect-in
    (open-input-string "<!doctype html><main id=\"app\">Hi &amp; bye</main>"))
> (port-count-lines! inspect-in)
> (define first-token
    (inspect-lexer inspect-in))
> (lexer-token-has-positions? first-token)

#t

> (lexer-token-name first-token)

'keyword

> (lexer-token-value first-token)

"<!doctype html>"

> (position-offset (lexer-token-start first-token))

1

> (position-offset (lexer-token-end first-token))

16

}

Constructs a streaming HTML lexer for the derived-token layer.

procedure

(html-string->derived-tokens source)

  (listof html-derived-token?)
  source : string?
Tokenizes an entire HTML string into derived HTML token values.

procedure

(html-derived-token? v)  boolean?

  v : any/c
Recognizes derived HTML token values returned by make-html-derived-lexer and html-string->derived-tokens.

procedure

(html-derived-token-tags token)  (listof symbol?)

  token : html-derived-token?
Returns the HTML-specific classification tags attached to a derived HTML token.

procedure

(html-derived-token-has-tag? token tag)  boolean?

  token : html-derived-token?
  tag : symbol?
Determines whether a derived HTML token carries a given classification tag.

procedure

(html-derived-token-text token)  string?

  token : html-derived-token?
Returns the exact source text corresponding to a derived HTML token.

procedure

(html-derived-token-start token)  position?

  token : html-derived-token?
Returns the starting source position for a derived HTML token.

procedure

(html-derived-token-end token)  position?

  token : html-derived-token?
Returns the ending source position for a derived HTML token.

3.2 HTML Derived Tokens🔗ℹ

The current HTML scaffold may attach tags such as:

  • 'html-tag-name

  • 'html-closing-tag-name

  • 'html-attribute-name

  • 'html-attribute-value

  • 'html-text

  • 'html-entity

  • 'html-doctype

  • 'comment

  • 'embedded-css

  • 'embedded-javascript

  • 'malformed-token

Delegated CSS and JavaScript body tokens keep their reusable semantic tags and gain an additional language marker such as 'embedded-css or 'embedded-javascript.

Examples:
> (define derived-tokens
    (html-string->derived-tokens
     "<!doctype html><section id=main class=\"card\">Hi &amp; bye<style>.hero { color: #c33; }</style><script>const root = document.querySelector(\"#app\");</script></section>"))
> (map (lambda (token)
         (list (html-derived-token-text token)
               (html-derived-token-tags token)
               (html-derived-token-has-tag? token 'html-tag-name)
               (html-derived-token-has-tag? token 'html-attribute-name)
               (html-derived-token-has-tag? token 'html-attribute-value)
               (html-derived-token-has-tag? token 'html-text)
               (html-derived-token-has-tag? token 'html-entity)
               (html-derived-token-has-tag? token 'embedded-css)
               (html-derived-token-has-tag? token 'embedded-javascript)))
       derived-tokens)

'(("<!doctype html>" (keyword html-doctype) #f #f #f #f #f #f #f)

  ("<" (delimiter) #f #f #f #f #f #f #f)

  ("section" (identifier html-tag-name) (html-tag-name) #f #f #f #f #f #f)

  (" " (whitespace) #f #f #f #f #f #f #f)

  ("id"

   (identifier html-attribute-name)

   #f

   (html-attribute-name)

   #f

   #f

   #f

   #f

   #f)

  ("=" (operator) #f #f #f #f #f #f #f)

  ("main"

   (literal html-attribute-value)

   #f

   #f

   (html-attribute-value)

   #f

   #f

   #f

   #f)

  (" " (whitespace) #f #f #f #f #f #f #f)

  ("class"

   (identifier html-attribute-name)

   #f

   (html-attribute-name)

   #f

   #f

   #f

   #f

   #f)

  ("=" (operator) #f #f #f #f #f #f #f)

  ("\"card\""

   (html-attribute-value literal)

   #f

   #f

   (html-attribute-value literal)

   #f

   #f

   #f

   #f)

  (">" (delimiter) #f #f #f #f #f #f #f)

  ("Hi " (literal html-text) #f #f #f (html-text) #f #f #f)

  ("&amp;" (literal html-entity) #f #f #f #f (html-entity) #f #f)

  (" bye" (literal html-text) #f #f #f (html-text) #f #f #f)

  ("<" (delimiter) #f #f #f #f #f #f #f)

  ("style" (identifier html-tag-name) (html-tag-name) #f #f #f #f #f #f)

  (">" (delimiter) #f #f #f #f #f #f #f)

  ("." (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)

  ("hero"

   (embedded-css identifier property-name-candidate selector-token)

   #f

   #f

   #f

   #f

   #f

   (embedded-css identifier property-name-candidate selector-token)

   #f)

  (" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)

  ("{" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)

  (" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)

  ("color"

   (embedded-css identifier property-name-candidate property-name)

   #f

   #f

   #f

   #f

   #f

   (embedded-css identifier property-name-candidate property-name)

   #f)

  (":" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)

  (" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)

  ("#c33"

   (embedded-css literal color-literal declaration-value-token)

   #f

   #f

   #f

   #f

   #f

   (embedded-css literal color-literal declaration-value-token)

   #f)

  (";" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)

  (" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)

  ("}" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)

  ("</" (delimiter) #f #f #f #f #f #f #f)

  ("style" (identifier html-closing-tag-name) #f #f #f #f #f #f #f)

  (">" (delimiter) #f #f #f #f #f #f #f)

  ("<" (delimiter) #f #f #f #f #f #f #f)

  ("script" (identifier html-tag-name) (html-tag-name) #f #f #f #f #f #f)

  (">" (delimiter) #f #f #f #f #f #f #f)

  ("const"

   (embedded-javascript keyword)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript keyword))

  (" "

   (embedded-javascript whitespace)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript whitespace))

  ("root"

   (embedded-javascript identifier declaration-name)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript identifier declaration-name))

  (" "

   (embedded-javascript whitespace)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript whitespace))

  ("="

   (embedded-javascript operator)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript operator))

  (" "

   (embedded-javascript whitespace)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript whitespace))

  ("document"

   (embedded-javascript identifier)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript identifier))

  ("."

   (embedded-javascript delimiter)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript delimiter))

  ("querySelector"

   (embedded-javascript identifier method-name property-name)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript identifier method-name property-name))

  ("("

   (embedded-javascript delimiter)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript delimiter))

  ("\"#app\""

   (embedded-javascript literal string-literal)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript literal string-literal))

  (")"

   (embedded-javascript delimiter)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript delimiter))

  (";"

   (embedded-javascript delimiter)

   #f

   #f

   #f

   #f

   #f

   #f

   (embedded-javascript delimiter))

  ("</" (delimiter) #f #f #f #f #f #f #f)

  ("script" (identifier html-closing-tag-name) #f #f #f #f #f #f #f)

  (">" (delimiter) #f #f #f #f #f #f #f)

  ("</" (delimiter) #f #f #f #f #f #f #f)

  ("section" (identifier html-closing-tag-name) #f #f #f #f #f #f #f)

  (">" (delimiter) #f #f #f #f #f #f #f))

}

value

html-profiles : immutable-hash?

The profile defaults used by the HTML lexer.

4 C🔗ℹ

 (require lexers/c) package: lexers-lib

The projected C API has two entry points:

The first C implementation is a handwritten streaming lexer grounded primarily in C lexical and preprocessing-token rules. It is preprocessor-aware from the first slice, so directive lines like #include and #define are tokenized directly instead of being flattened into ordinary punctuation and identifiers. It also recognizes C digraph punctuators and validates string and character escape sequences so malformed escapes stay inside one malformed literal token.

procedure

(make-c-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming C lexer.

Projected C categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Keywords and preprocessor directive names project as 'keyword. Header names such as <stdio.h> and "local.h" project as 'literal.

Examples:
> (define lexer
    (make-c-lexer #:profile 'coloring))
> (define in
    (open-input-string "#include <stdio.h>\nint main(void) { return 0; }\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(delimiter keyword whitespace literal)

procedure

(c-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected C tokens.

The derived C API provides reusable language-specific structure:

procedure

(make-c-derived-lexer)

  (input-port? . -> . (or/c c-derived-token? 'eof))
Constructs a streaming C lexer that returns derived C tokens.

procedure

(c-string->derived-tokens source)  (listof c-derived-token?)

  source : string?
Tokenizes all of source eagerly and returns derived C tokens.

procedure

(c-derived-token? v)  boolean?

  v : any/c
Recognizes derived C tokens.

procedure

(c-derived-token-tags token)  (listof symbol?)

  token : c-derived-token?
Returns the derived-token tags for token.

procedure

(c-derived-token-has-tag? token tag)  boolean?

  token : c-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(c-derived-token-text token)  string?

  token : c-derived-token?
Returns the exact source text covered by token.

procedure

(c-derived-token-start token)  position?

  token : c-derived-token?
Returns the starting source position of token.

procedure

(c-derived-token-end token)  position?

  token : c-derived-token?
Returns the ending source position of token.

The first reusable C-specific derived tags include:

  • 'c-comment

  • 'c-whitespace

  • 'c-keyword

  • 'c-identifier

  • 'c-string-literal

  • 'c-char-literal

  • 'c-numeric-literal

  • 'c-operator

  • 'c-delimiter

  • 'c-preprocessor-directive

  • 'c-header-name

  • 'c-line-splice

  • 'c-error

  • 'malformed-token

Malformed C input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled c or h delegate to lexers/c. Wrapped delegated Markdown tokens preserve C-derived tags and gain 'embedded-c.}

value

c-profiles : immutable-hash?

The profile defaults used by the C lexer.

5 C++🔗ℹ

 (require lexers/cpp) package: lexers-lib

The projected C++ API has two entry points:

The first C++ implementation is a handwritten streaming lexer grounded in C++ lexical structure. It is preprocessor-aware and covers comments, identifiers, keywords, operator words, character and string literals, raw string literals, numeric literals, and punctuators such as :: and ->.

procedure

(make-cpp-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming C++ lexer.

Projected C++ categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Examples:
> (define lexer
    (make-cpp-lexer #:profile 'coloring))
> (define in
    (open-input-string "#include <vector>\nstd::string s;\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(delimiter keyword whitespace literal)

procedure

(cpp-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected C++ tokens.

The derived C++ API provides reusable language-specific structure:

procedure

(make-cpp-derived-lexer)

  (input-port? . -> . (or/c cpp-derived-token? 'eof))
Constructs a streaming C++ lexer that returns derived C++ tokens.

procedure

(cpp-string->derived-tokens source)

  (listof cpp-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived C++ tokens.

procedure

(cpp-derived-token? v)  boolean?

  v : any/c
Recognizes derived C++ tokens.

procedure

(cpp-derived-token-tags token)  (listof symbol?)

  token : cpp-derived-token?
Returns the derived-token tags for token.

procedure

(cpp-derived-token-has-tag? token tag)  boolean?

  token : cpp-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(cpp-derived-token-text token)  string?

  token : cpp-derived-token?
Returns the exact source text covered by token.

procedure

(cpp-derived-token-start token)  position?

  token : cpp-derived-token?
Returns the starting source position of token.

procedure

(cpp-derived-token-end token)  position?

  token : cpp-derived-token?
Returns the ending source position of token.

The first reusable C++-specific derived tags include:

  • 'cpp-comment

  • 'cpp-whitespace

  • 'cpp-keyword

  • 'cpp-identifier

  • 'cpp-string-literal

  • 'cpp-char-literal

  • 'cpp-numeric-literal

  • 'cpp-operator

  • 'cpp-delimiter

  • 'cpp-preprocessor-directive

  • 'cpp-header-name

  • 'cpp-line-splice

  • 'cpp-error

  • 'malformed-token

Ordinary C++ strings and character literals validate common escape structures in the derived layer, including simple, octal, hexadecimal, and universal- character escapes. Invalid escapes and malformed multi-character character literals remain source-faithful but are tagged with 'malformed-token.

Malformed C++ input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled cpp, c++, cc, cxx, hpp, hh, or hxx delegate to lexers/cpp. Wrapped delegated Markdown tokens preserve C++-derived tags and gain 'embedded-cpp.}

value

cpp-profiles : immutable-hash?

The profile defaults used by the C++ lexer.

6 CSV🔗ℹ

 (require lexers/csv) package: lexers-lib

The projected CSV API has two entry points:

The first CSV implementation is a handwritten streaming lexer for comma-separated text. It preserves exact source text, including empty fields and CRLF row separators.

procedure

(make-csv-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming CSV lexer.

Projected CSV categories include 'literal, 'delimiter, and 'unknown.

Field contents project as 'literal. Field separators and row separators project as 'delimiter.

Examples:
> (define lexer
    (make-csv-lexer #:profile 'coloring))
> (define in
    (open-input-string "name,age\nAda,37\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(literal delimiter literal delimiter)

procedure

(csv-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected CSV tokens.

The derived CSV API provides reusable structure for delimited text:

procedure

(make-csv-derived-lexer)

  (input-port? . -> . (or/c csv-derived-token? 'eof))
Constructs a streaming CSV lexer that returns derived CSV tokens.

procedure

(csv-string->derived-tokens source)

  (listof csv-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived CSV tokens.

procedure

(csv-derived-token? v)  boolean?

  v : any/c
Recognizes derived CSV tokens.

procedure

(csv-derived-token-tags token)  (listof symbol?)

  token : csv-derived-token?
Returns the derived-token tags for token.

procedure

(csv-derived-token-has-tag? token tag)  boolean?

  token : csv-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(csv-derived-token-text token)  string?

  token : csv-derived-token?
Returns the exact source text covered by token.

procedure

(csv-derived-token-start token)  position?

  token : csv-derived-token?
Returns the starting source position of token.

procedure

(csv-derived-token-end token)  position?

  token : csv-derived-token?
Returns the ending source position of token.

The first reusable CSV-specific derived tags include:

  • 'delimited-field

  • 'delimited-quoted-field

  • 'delimited-unquoted-field

  • 'delimited-empty-field

  • 'delimited-separator

  • 'delimited-row-separator

  • 'delimited-error

  • 'csv-field

  • 'csv-separator

  • 'csv-row-separator

  • 'malformed-token

Malformed CSV input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled csv delegate to lexers/csv. Wrapped delegated Markdown tokens preserve CSV-derived tags and gain 'embedded-csv.}

value

csv-profiles : immutable-hash?

The profile defaults used by the CSV lexer.

7 JSON🔗ℹ

 (require lexers/json) package: lexers-lib

The projected JSON API has two entry points:

procedure

(make-json-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming JSON lexer.

The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.

Projected JSON categories include 'delimiter, 'operator, 'identifier, 'literal, 'whitespace, and 'unknown.

Object keys project as 'identifier, while numbers, ordinary strings, and the JSON keywords true, false, and null project as 'literal.

Examples:
> (define lexer
    (make-json-lexer #:profile 'coloring))
> (define in
    (open-input-string "{\"x\": [1, true]}"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(delimiter identifier operator whitespace)

procedure

(json-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected JSON tokens.

The derived JSON API provides reusable language-specific structure:

Constructs a streaming JSON lexer that returns derived JSON tokens.

procedure

(json-string->derived-tokens source)

  (listof json-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived JSON tokens.

procedure

(json-derived-token? v)  boolean?

  v : any/c
Recognizes derived JSON tokens.

procedure

(json-derived-token-tags token)  (listof symbol?)

  token : json-derived-token?
Returns the derived-token tags for token.

procedure

(json-derived-token-has-tag? token tag)  boolean?

  token : json-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(json-derived-token-text token)  string?

  token : json-derived-token?
Returns the exact source text covered by token.

procedure

(json-derived-token-start token)  position?

  token : json-derived-token?
Returns the starting source position of token.

procedure

(json-derived-token-end token)  position?

  token : json-derived-token?
Returns the ending source position of token.

The first reusable JSON-specific derived tags include:

  • 'json-object-key

  • 'json-string

  • 'json-number

  • 'json-true

  • 'json-false

  • 'json-null

  • 'json-object-start

  • 'json-object-end

  • 'json-array-start

  • 'json-array-end

  • 'json-comma

  • 'json-colon

  • 'json-error

  • 'malformed-token

Malformed JSON input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled json delegate to lexers/json. Wrapped delegated Markdown tokens preserve JSON-derived tags and gain 'embedded-json.}

8 Makefile🔗ℹ

 (require lexers/makefile) package: lexers-lib

The projected Makefile API has two entry points:

The first Makefile implementation is a handwritten streaming lexer aimed at ordinary Makefile, GNUmakefile, and .mk inputs. It covers comments, directive lines, variable assignments, rule targets, recipe lines, variable references, delimiters, and CRLF-preserving source fidelity.

procedure

(make-makefile-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Makefile lexer.

Projected Makefile categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Directive words such as include project as 'keyword. Assignment operators such as := and += project as 'operator. Rule separators such as : project as 'delimiter.

procedure

(makefile-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected Makefile tokens.

The derived Makefile API provides reusable language-specific structure:

Constructs a streaming Makefile lexer that returns derived Makefile tokens.

procedure

(makefile-string->derived-tokens source)

  (listof makefile-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived Makefile tokens.

procedure

(makefile-derived-token? v)  boolean?

  v : any/c
Recognizes derived Makefile tokens.

procedure

(makefile-derived-token-tags token)  (listof symbol?)

  token : makefile-derived-token?
Returns the derived-token tags for token.

procedure

(makefile-derived-token-has-tag? token tag)  boolean?

  token : makefile-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(makefile-derived-token-text token)  string?

  token : makefile-derived-token?
Returns the exact source text covered by token.

Returns the starting source position of token.

procedure

(makefile-derived-token-end token)  position?

  token : makefile-derived-token?
Returns the ending source position of token.

The first reusable Makefile-specific derived tags include:

  • 'makefile-directive

  • 'makefile-variable

  • 'makefile-assignment-operator

  • 'makefile-rule-target

  • 'makefile-rule-delimiter

  • 'makefile-variable-reference

  • 'makefile-paren-variable-reference

  • 'makefile-brace-variable-reference

  • 'makefile-recipe-separator

  • 'makefile-order-only-delimiter

  • 'malformed-token

Markdown fenced code blocks labeled make, makefile, or mk delegate to lexers/makefile. Wrapped delegated Markdown tokens preserve Makefile-derived tags and gain 'embedded-makefile.}

9 Plist🔗ℹ

 (require lexers/plist) package: lexers-lib

The projected plist API has two entry points:

The first plist implementation is a handwritten streaming lexer for XML property-list files such as Info.plist. The first slice deliberately targets XML plists only; it does not attempt to cover binary bplist files. Because this scope is XML-only, quoted attribute values are treated as ordinary plist attribute values, while unquoted attribute values are treated as malformed input.

procedure

(make-plist-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming plist lexer.

Projected plist categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

XML declarations and plist doctypes project as 'keyword. Element content such as CFBundleName and Lexers projects as 'literal.

procedure

(plist-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected plist tokens.

The derived plist API provides reusable language-specific structure:

Constructs a streaming plist lexer that returns derived plist tokens.

procedure

(plist-string->derived-tokens source)

  (listof plist-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived plist tokens.

procedure

(plist-derived-token? v)  boolean?

  v : any/c
Recognizes derived plist tokens.

procedure

(plist-derived-token-tags token)  (listof symbol?)

  token : plist-derived-token?
Returns the derived-token tags for token.

procedure

(plist-derived-token-has-tag? token tag)  boolean?

  token : plist-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(plist-derived-token-text token)  string?

  token : plist-derived-token?
Returns the exact source text covered by token.

procedure

(plist-derived-token-start token)  position?

  token : plist-derived-token?
Returns the starting source position of token.

procedure

(plist-derived-token-end token)  position?

  token : plist-derived-token?
Returns the ending source position of token.

The first reusable plist-specific derived tags include:

  • 'plist-processing-instruction

  • 'plist-doctype

  • 'plist-tag-name

  • 'plist-closing-tag-name

  • 'plist-attribute-name

  • 'plist-attribute-value

  • 'plist-entity

  • 'plist-key-text

  • 'plist-string-text

  • 'plist-data-text

  • 'plist-date-text

  • 'plist-integer-text

  • 'plist-real-text

  • 'plist-text

  • 'plist-comment

  • 'malformed-token

Markdown fenced code blocks labeled plist delegate to lexers/plist. Wrapped delegated Markdown tokens preserve plist-derived tags and gain 'embedded-plist.}

10 YAML🔗ℹ

 (require lexers/yaml) package: lexers-lib

The projected YAML API has two entry points:

The first YAML implementation is a handwritten streaming lexer grounded primarily in the YAML 1.2.2 lexical and structural rules. The first slice is deliberately parser-lite, but it covers practical block mappings, block sequences, flow delimiters, directives, document markers, quoted scalars, plain scalars, comments, and block scalar bodies. Block-scalar headers validate the compact YAML indicator forms, so malformed headers remain source-faithful but are tagged with 'malformed-token instead of enabling block-scalar mode.

procedure

(make-yaml-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming YAML lexer.

Projected YAML categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'delimiter, and 'unknown.

Directive lines such as %YAML 1.2 project as 'keyword. Plain and quoted scalars project as 'literal. Structural markers such as :, -, [, ], {, }, and document markers project as 'delimiter.

Examples:
> (define lexer
    (make-yaml-lexer #:profile 'coloring))
> (define in
    (open-input-string "name: Deploy\non:\n  push:\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(literal delimiter whitespace literal)

procedure

(yaml-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected YAML tokens.

The derived YAML API provides reusable language-specific structure:

Constructs a streaming YAML lexer that returns derived YAML tokens.

procedure

(yaml-string->derived-tokens source)

  (listof yaml-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived YAML tokens.

procedure

(yaml-derived-token? v)  boolean?

  v : any/c
Recognizes derived YAML tokens.

procedure

(yaml-derived-token-tags token)  (listof symbol?)

  token : yaml-derived-token?
Returns the derived-token tags for token.

procedure

(yaml-derived-token-has-tag? token tag)  boolean?

  token : yaml-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(yaml-derived-token-text token)  string?

  token : yaml-derived-token?
Returns the exact source text covered by token.

procedure

(yaml-derived-token-start token)  position?

  token : yaml-derived-token?
Returns the starting source position of token.

procedure

(yaml-derived-token-end token)  position?

  token : yaml-derived-token?
Returns the ending source position of token.

The first reusable YAML-specific derived tags include:

  • 'yaml-comment

  • 'yaml-whitespace

  • 'yaml-directive

  • 'yaml-document-marker

  • 'yaml-sequence-indicator

  • 'yaml-key-indicator

  • 'yaml-value-indicator

  • 'yaml-flow-delimiter

  • 'yaml-anchor

  • 'yaml-alias

  • 'yaml-tag

  • 'yaml-string-literal

  • 'yaml-plain-scalar

  • 'yaml-key-scalar

  • 'yaml-boolean

  • 'yaml-null

  • 'yaml-number

  • 'yaml-block-scalar-header

  • 'yaml-block-scalar-content

  • 'yaml-error

  • 'malformed-token

Double-quoted YAML scalars validate common escape forms in the derived layer, including simple escapes plus \xXX, \uXXXX, and \UXXXXXXXX Unicode escapes. Invalid escape sequences remain source-faithful but are tagged with 'malformed-token.

Malformed YAML input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled yaml or yml delegate to lexers/yaml. Wrapped delegated Markdown tokens preserve YAML-derived tags and gain 'embedded-yaml.}

value

yaml-profiles : immutable-hash?

The profile defaults used by the YAML lexer.

11 Markdown🔗ℹ

 (require lexers/markdown) package: lexers-lib

The projected Markdown API has two entry points:

The first Markdown implementation is a handwritten, parser-lite, GitHub-flavored Markdown lexer. It is line-oriented and can delegate raw HTML and known fenced-code languages to the existing C, C++, CSV, HTML, CSS, JavaScript, JSON, Makefile, Objective-C, plist, Python, Racket, Scribble, shell, Swift, TSV, WAT, and YAML lexers.

procedure

(make-markdown-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Markdown lexer.

The result is a procedure of one argument, an input port. Each call reads the next projected Markdown token from the port and returns one projected token value.

When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.

The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.

Examples:
> (define lexer
    (make-markdown-lexer #:profile 'coloring))
> (define in
    (open-input-string "# Title\n\n```js\nconst x = 1;\n```\n"))
> (port-count-lines! in)
> (list (lexer in)
        (lexer in)
        (lexer in)
        (lexer in))

(list

 (position-token (token 'delimiter "#") (position 1 1 0) (position 2 1 1))

 (position-token (token 'whitespace " ") (position 2 1 1) (position 3 1 2))

 (position-token (token 'literal "Title") (position 3 1 2) (position 8 1 7))

 (position-token (token 'whitespace "\n") (position 8 1 7) (position 9 2 0)))

procedure

(markdown-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes an entire Markdown string using the projected token API.

This is a convenience wrapper over make-markdown-lexer.

11.1 Markdown Returned Tokens🔗ℹ

Common projected Markdown categories include:

  • 'whitespace

  • 'identifier

  • 'literal

  • 'keyword

  • 'operator

  • 'delimiter

  • 'comment

  • 'unknown

  • 'eof

For the current Markdown scaffold:

  • ordinary prose, inline code text, code-block text, and link or image payload text project mostly as 'literal

  • language names and delegated name-like tokens project as 'identifier or 'keyword, depending on the delegated lexer

  • structural markers such as heading markers, list markers, brackets, pipes, backticks, and fence delimiters project as 'delimiter

  • comments only appear through delegated embedded HTML

  • recoverable malformed constructs project as 'unknown in 'coloring mode and raise in 'compiler mode

For source continuity, the derived Markdown stream preserves the newline after a fenced-code info string as an explicit whitespace token before the code body. Incomplete fenced-code blocks are tokenized best-effort instead of raising an internal error.

Examples:
> (define inspect-lexer
    (make-markdown-lexer #:profile 'coloring))
> (define inspect-in
    (open-input-string "# Title\n\nText with <span class=\"x\">hi</span>\n"))
> (port-count-lines! inspect-in)
> (define first-token
    (inspect-lexer inspect-in))
> (lexer-token-has-positions? first-token)

#t

> (lexer-token-name first-token)

'delimiter

> (lexer-token-value first-token)

"#"

> (position-offset (lexer-token-start first-token))

1

> (position-offset (lexer-token-end first-token))

2

}

Constructs a streaming Markdown lexer for the derived-token layer.

procedure

(markdown-string->derived-tokens source)

  (listof markdown-derived-token?)
  source : string?
Tokenizes an entire Markdown string into derived Markdown token values.

procedure

(markdown-derived-token? v)  boolean?

  v : any/c
Recognizes derived Markdown token values returned by make-markdown-derived-lexer and markdown-string->derived-tokens.

procedure

(markdown-derived-token-tags token)  (listof symbol?)

  token : markdown-derived-token?
Returns the Markdown-specific classification tags attached to a derived Markdown token.

procedure

(markdown-derived-token-has-tag? token tag)  boolean?

  token : markdown-derived-token?
  tag : symbol?
Determines whether a derived Markdown token carries a given classification tag.

procedure

(markdown-derived-token-text token)  string?

  token : markdown-derived-token?
Returns the exact source text corresponding to a derived Markdown token.

Returns the starting source position for a derived Markdown token.

procedure

(markdown-derived-token-end token)  position?

  token : markdown-derived-token?
Returns the ending source position for a derived Markdown token.

11.2 Markdown Derived Tokens🔗ℹ

The current Markdown scaffold may attach tags such as:

  • 'markdown-text

  • 'markdown-heading-marker

  • 'markdown-heading-text

  • 'markdown-blockquote-marker

  • 'markdown-list-marker

  • 'markdown-task-marker

  • 'markdown-thematic-break

  • 'markdown-code-span

  • 'markdown-code-fence

  • 'markdown-code-block

  • 'markdown-code-info-string

  • 'markdown-emphasis-delimiter

  • 'markdown-strong-delimiter

  • 'markdown-strikethrough-delimiter

  • 'markdown-link-text

  • 'markdown-link-destination

  • 'markdown-link-title

  • 'markdown-image-marker

  • 'markdown-autolink

  • 'markdown-table-pipe

  • 'markdown-table-alignment

  • 'markdown-table-cell

  • 'markdown-escape

  • 'markdown-hard-line-break

  • 'embedded-html

  • 'embedded-css

  • 'embedded-cpp

  • 'embedded-csv

  • 'embedded-go

  • 'embedded-haskell

  • 'embedded-java

  • 'embedded-javascript

  • 'embedded-json

  • 'embedded-makefile

  • 'embedded-latex

  • 'embedded-objc

  • 'embedded-pascal

  • 'embedded-plist

  • 'embedded-python

  • 'embedded-racket

  • 'embedded-rust

  • 'embedded-shell

  • 'embedded-scribble

  • 'embedded-swift

  • 'embedded-tex

  • 'embedded-tsv

  • 'embedded-wat

  • 'embedded-yaml

  • 'malformed-token

Delegated raw HTML and recognized fenced-code languages keep their reusable derived tags and gain Markdown embedding markers such as 'embedded-html, 'embedded-cpp, 'embedded-csv, 'embedded-go, 'embedded-haskell, 'embedded-java, 'embedded-javascript, 'embedded-json, 'embedded-latex, 'embedded-makefile, 'embedded-objc, 'embedded-pascal, 'embedded-plist, 'embedded-python, 'embedded-racket, 'embedded-rust, 'embedded-shell, 'embedded-swift, 'embedded-tex, 'embedded-tsv, 'embedded-wat, or 'embedded-yaml.

Examples:
> (define derived-tokens
    (markdown-string->derived-tokens
     "# Title\n\n- [x] done\n\n```js\nconst x = 1;\n```\n\nText <span class=\"x\">hi</span>\n"))
> (map (lambda (token)
         (list (markdown-derived-token-text token)
               (markdown-derived-token-tags token)))
       derived-tokens)

'(("#" (delimiter markdown-heading-marker))

  (" " (whitespace))

  ("Title" (literal markdown-heading-text))

  ("\n" (whitespace))

  ("\n" (whitespace))

  ("-" (delimiter markdown-list-marker))

  (" " (whitespace))

  ("[x]" (delimiter markdown-task-marker))

  (" " (whitespace))

  ("done" (literal markdown-text))

  ("\n" (whitespace))

  ("\n" (whitespace))

  ("```" (delimiter markdown-code-fence))

  ("js" (identifier markdown-code-info-string))

  ("\n" (whitespace))

  ("const" (keyword embedded-javascript markdown-code-block))

  (" " (whitespace embedded-javascript markdown-code-block))

  ("x" (identifier declaration-name embedded-javascript markdown-code-block))

  (" " (whitespace embedded-javascript markdown-code-block))

  ("=" (operator embedded-javascript markdown-code-block))

  (" " (whitespace embedded-javascript markdown-code-block))

  ("1" (literal numeric-literal embedded-javascript markdown-code-block))

  (";" (delimiter embedded-javascript markdown-code-block))

  ("\n" (whitespace embedded-javascript markdown-code-block))

  ("```" (delimiter markdown-code-fence))

  ("\n" (whitespace))

  ("\n" (whitespace))

  ("Text " (literal markdown-text))

  ("<" (delimiter embedded-html))

  ("span" (identifier html-tag-name embedded-html))

  (" " (whitespace embedded-html))

  ("class" (identifier html-attribute-name embedded-html))

  ("=" (operator embedded-html))

  ("\"x\"" (literal html-attribute-value embedded-html))

  (">" (delimiter embedded-html))

  ("hi" (literal html-text embedded-html))

  ("</" (delimiter embedded-html))

  ("span" (identifier html-closing-tag-name embedded-html))

  (">" (delimiter embedded-html))

  ("\n" (whitespace)))

}

value

markdown-profiles : immutable-hash?

The profile defaults used by the Markdown lexer.

12 Go🔗ℹ

 (require lexers/go) package: lexers-lib

The projected Go API has two entry points:

The first Go implementation is a handwritten streaming lexer grounded in the official Go lexical specification. It covers whitespace, line and general comments, identifiers, keywords, string and rune literals, numeric and imaginary literals, operators, and delimiters. In the 'compiler profile, the projected token stream also performs Go semicolon insertion at the newline and EOF boundaries required by the specification, while the 'coloring profile remains source-faithful.

procedure

(make-go-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Go lexer.

Projected Go categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Examples:
> (define lexer
    (make-go-lexer #:profile 'coloring))
> (define in
    (open-input-string "package main\nfunc main() {}\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(keyword whitespace identifier whitespace)

}

procedure

(go-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected Go tokens.

The derived Go API provides reusable language-specific structure:

procedure

(make-go-derived-lexer)

  (input-port? . -> . (or/c go-derived-token? 'eof))
Constructs a streaming Go lexer that returns derived Go tokens.

procedure

(go-string->derived-tokens source)  (listof go-derived-token?)

  source : string?
Tokenizes all of source eagerly and returns derived Go tokens.

procedure

(go-derived-token? v)  boolean?

  v : any/c
Recognizes derived Go tokens.

procedure

(go-derived-token-tags token)  (listof symbol?)

  token : go-derived-token?
Returns the derived-token tags for token.

procedure

(go-derived-token-has-tag? token tag)  boolean?

  token : go-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(go-derived-token-text token)  string?

  token : go-derived-token?
Returns the exact source text covered by token.

procedure

(go-derived-token-start token)  position?

  token : go-derived-token?
Returns the starting source position of token.

procedure

(go-derived-token-end token)  position?

  token : go-derived-token?
Returns the ending source position of token.

The first reusable Go-specific derived tags include:

  • 'go-comment

  • 'go-line-comment

  • 'go-general-comment

  • 'go-whitespace

  • 'go-keyword

  • 'go-identifier

  • 'go-string-literal

  • 'go-raw-string-literal

  • 'go-rune-literal

  • 'go-numeric-literal

  • 'go-imaginary-literal

  • 'go-operator

  • 'go-delimiter

  • 'malformed-token

Malformed Go input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled go and golang delegate to lexers/go. Wrapped delegated Markdown tokens preserve Go-derived tags and gain 'embedded-go.}

value

go-profiles : immutable-hash?

The profile defaults used by the Go lexer.

13 Java🔗ℹ

 (require lexers/java) package: lexers-lib

The projected Java API has two entry points:

The first Java implementation is a handwritten streaming lexer grounded in the Java lexical grammar. It covers whitespace, line and block comments, identifiers, keywords, the contextual keyword non-sealed, string literals, text blocks, char literals, numeric literals, operators, and delimiters. It also recognizes Java Unicode escapes for lexical classification while preserving exact source slices in the emitted tokens, validates Java escape sequences including octal escapes, and recognizes text blocks only when the opening delimiter has the JLS-required trailing line terminator. Numeric literals validate their required digit-bearing parts, so malformed forms such as 1e, 0x, and 0b remain source-faithful but are tagged with 'malformed-token.

procedure

(make-java-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Java lexer.

Projected Java categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Examples:
> (define lexer
    (make-java-lexer #:profile 'coloring))
> (define in
    (open-input-string "class Example {\n    String s = \"hi\";\n}\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(keyword whitespace identifier whitespace)

}

procedure

(java-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected Java tokens.

The derived Java API provides reusable language-specific structure:

Constructs a streaming Java lexer that returns derived Java tokens.

procedure

(java-string->derived-tokens source)

  (listof java-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived Java tokens.

procedure

(java-derived-token? v)  boolean?

  v : any/c
Recognizes derived Java tokens.

procedure

(java-derived-token-tags token)  (listof symbol?)

  token : java-derived-token?
Returns the derived-token tags for token.

procedure

(java-derived-token-has-tag? token tag)  boolean?

  token : java-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(java-derived-token-text token)  string?

  token : java-derived-token?
Returns the exact source text covered by token.

procedure

(java-derived-token-start token)  position?

  token : java-derived-token?
Returns the starting source position of token.

procedure

(java-derived-token-end token)  position?

  token : java-derived-token?
Returns the ending source position of token.

The first reusable Java-specific derived tags include:

  • 'java-comment

  • 'java-line-comment

  • 'java-block-comment

  • 'java-doc-comment

  • 'java-whitespace

  • 'java-keyword

  • 'java-identifier

  • 'java-annotation-marker

  • 'java-annotation-name

  • 'java-string-literal

  • 'java-text-block

  • 'java-char-literal

  • 'java-numeric-literal

  • 'java-boolean-literal

  • 'java-true-literal

  • 'java-false-literal

  • 'java-null-literal

  • 'java-operator

  • 'java-delimiter

  • 'malformed-token

Malformed Java input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled java delegate to lexers/java. Wrapped delegated Markdown tokens preserve Java-derived tags and gain 'embedded-java.}

value

java-profiles : immutable-hash?

The profile defaults used by the Java lexer.

14 Haskell🔗ℹ

 (require lexers/haskell) package: lexers-lib

The projected Haskell API has two entry points:

The first Haskell implementation is a handwritten streaming lexer grounded in the Haskell lexical-structure specification, with a small set of practical GHC-era additions such as pragmas. It covers whitespace, line comments, nested comments, pragmas, identifiers, operators, strings, characters, numeric literals, and delimiters. In the 'compiler profile, the projected token stream also inserts ordinary Haskell layout tokens for let, where, do, and of, while the 'coloring profile remains source-faithful.

procedure

(make-haskell-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Haskell lexer.

Projected Haskell categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Examples:
> (define lexer
    (make-haskell-lexer #:profile 'coloring))
> (define in
    (open-input-string "{-# LANGUAGE OverloadedStrings #-}\nmodule Main where\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(comment whitespace keyword whitespace)

}

procedure

(haskell-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected Haskell tokens.

The derived Haskell API provides reusable language-specific structure:

Constructs a streaming Haskell lexer that returns derived Haskell tokens.

procedure

(haskell-string->derived-tokens source)

  (listof haskell-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived Haskell tokens.

procedure

(haskell-derived-token? v)  boolean?

  v : any/c
Recognizes derived Haskell tokens.

procedure

(haskell-derived-token-tags token)  (listof symbol?)

  token : haskell-derived-token?
Returns the derived-token tags for token.

procedure

(haskell-derived-token-has-tag? token tag)  boolean?

  token : haskell-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(haskell-derived-token-text token)  string?

  token : haskell-derived-token?
Returns the exact source text covered by token.

procedure

(haskell-derived-token-start token)  position?

  token : haskell-derived-token?
Returns the starting source position of token.

procedure

(haskell-derived-token-end token)  position?

  token : haskell-derived-token?
Returns the ending source position of token.

The first reusable Haskell-specific derived tags include:

  • 'haskell-comment

  • 'haskell-line-comment

  • 'haskell-nested-comment

  • 'haskell-pragma

  • 'haskell-whitespace

  • 'haskell-keyword

  • 'haskell-variable-identifier

  • 'haskell-constructor-identifier

  • 'haskell-variable-operator

  • 'haskell-constructor-operator

  • 'haskell-string-literal

  • 'haskell-char-literal

  • 'haskell-numeric-literal

  • 'haskell-delimiter

  • 'malformed-token

Malformed Haskell input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled haskell, hs, and lhs delegate to lexers/haskell. Wrapped delegated Markdown tokens preserve Haskell-derived tags and gain 'embedded-haskell.}

value

haskell-profiles : immutable-hash?

The profile defaults used by the Haskell lexer.

15 Objective-C🔗ℹ

 (require lexers/objc) package: lexers-lib

The projected Objective-C API has two entry points:

The first Objective-C implementation is a handwritten streaming lexer grounded in the language’s lexical surface and existing Objective-C lexer prior art. It is preprocessor-aware and covers comments, identifiers, C / Objective-C keywords, at-sign Objective-C keywords, Objective-C strings, object-literal introducers, numbers, operators, and delimiters.

procedure

(make-objc-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Objective-C lexer.

Projected Objective-C categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Examples:
> (define lexer
    (make-objc-lexer #:profile 'coloring))
> (define in
    (open-input-string "@interface Foo : NSObject\n@property NSString *name;\n@end\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(keyword whitespace identifier whitespace)

procedure

(objc-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected Objective-C tokens.

The derived Objective-C API provides reusable language-specific structure:

Constructs a streaming Objective-C lexer that returns derived Objective-C tokens.

procedure

(objc-string->derived-tokens source)

  (listof objc-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived Objective-C tokens.

procedure

(objc-derived-token? v)  boolean?

  v : any/c
Recognizes derived Objective-C tokens.

procedure

(objc-derived-token-tags token)  (listof symbol?)

  token : objc-derived-token?
Returns the derived-token tags for token.

procedure

(objc-derived-token-has-tag? token tag)  boolean?

  token : objc-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(objc-derived-token-text token)  string?

  token : objc-derived-token?
Returns the exact source text covered by token.

procedure

(objc-derived-token-start token)  position?

  token : objc-derived-token?
Returns the starting source position of token.

procedure

(objc-derived-token-end token)  position?

  token : objc-derived-token?
Returns the ending source position of token.

The first reusable Objective-C-specific derived tags include:

  • 'objc-comment

  • 'objc-whitespace

  • 'objc-keyword

  • 'objc-at-keyword

  • 'objc-identifier

  • 'objc-string-literal

  • 'objc-char-literal

  • 'objc-numeric-literal

  • 'objc-operator

  • 'objc-delimiter

  • 'objc-preprocessor-directive

  • 'objc-header-name

  • 'objc-literal-introducer

  • 'objc-line-splice

  • 'objc-error

  • 'malformed-token

Ordinary Objective-C strings, ... strings, and character literals validate common escape structures in the derived layer, including simple, octal, hexadecimal, and universal-character escapes. Invalid escapes and malformed multi-character character literals remain source-faithful but are tagged with 'malformed-token.

Malformed Objective-C input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled objc, objective-c, objectivec, or obj-c delegate to lexers/objc. Wrapped delegated Markdown tokens preserve Objective-C-derived tags and gain 'embedded-objc.}

value

objc-profiles : immutable-hash?

The profile defaults used by the Objective-C lexer.

16 Pascal🔗ℹ

 (require lexers/pascal) package: lexers-lib

The projected Pascal API has two entry points:

The first Pascal implementation is a handwritten streaming lexer grounded in the Free Pascal token reference. It covers whitespace, three comment forms, identifiers, escaped reserved-word identifiers, reserved words, numeric literals, strings, control-string fragments, operators, and delimiters.

procedure

(make-pascal-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Pascal lexer.

Projected Pascal categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Examples:
> (define lexer
    (make-pascal-lexer #:profile 'coloring))
> (define in
    (open-input-string "program Test;\nvar &do: Integer;\nbegin\nend.\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(keyword whitespace identifier delimiter)

}

procedure

(pascal-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected Pascal tokens.

The derived Pascal API provides reusable language-specific structure:

Constructs a streaming Pascal lexer that returns derived Pascal tokens.

procedure

(pascal-string->derived-tokens source)

  (listof pascal-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived Pascal tokens.

procedure

(pascal-derived-token? v)  boolean?

  v : any/c
Recognizes derived Pascal tokens.

procedure

(pascal-derived-token-tags token)  (listof symbol?)

  token : pascal-derived-token?
Returns the derived-token tags for token.

procedure

(pascal-derived-token-has-tag? token tag)  boolean?

  token : pascal-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(pascal-derived-token-text token)  string?

  token : pascal-derived-token?
Returns the exact source text covered by token.

procedure

(pascal-derived-token-start token)  position?

  token : pascal-derived-token?
Returns the starting source position of token.

procedure

(pascal-derived-token-end token)  position?

  token : pascal-derived-token?
Returns the ending source position of token.

The first reusable Pascal-specific derived tags include:

  • 'pascal-comment

  • 'pascal-compiler-directive

  • 'pascal-whitespace

  • 'pascal-keyword

  • 'pascal-identifier

  • 'pascal-escaped-identifier

  • 'pascal-string-literal

  • 'pascal-control-string

  • 'pascal-numeric-literal

  • 'pascal-operator

  • 'pascal-delimiter

  • 'malformed-token

Compiler directives inside brace or star comments, such as {$mode objfpc} and (*$ifdef DEBUG*), remain comments in the projected stream while the derived layer preserves 'pascal-compiler-directive.

Malformed Pascal input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled pascal, pas, delphi, and objectpascal delegate to lexers/pascal. Wrapped delegated Markdown tokens preserve Pascal-derived tags and gain 'embedded-pascal.}

value

pascal-profiles : immutable-hash?

The profile defaults used by the Pascal lexer.

17 Python🔗ℹ

 (require lexers/python) package: lexers-lib

The projected Python API has two entry points:

The first Python implementation is a handwritten streaming lexer grounded in Python’s lexical-analysis rules. It tracks indentation-sensitive line starts, physical and logical newlines, names, comments, strings, numbers, operators, and delimiters.

procedure

(make-python-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Python lexer.

Projected Python categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Soft keywords currently project as 'keyword, while the derived layer keeps the more specific 'python-soft-keyword tag.

Examples:
> (define lexer
    (make-python-lexer #:profile 'coloring))
> (define in
    (open-input-string "def answer(x):\n    return x\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(keyword whitespace identifier delimiter)

procedure

(python-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected Python tokens.

The derived Python API provides reusable language-specific structure:

Constructs a streaming Python lexer that returns derived Python tokens.

procedure

(python-string->derived-tokens source)

  (listof python-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived Python tokens.

procedure

(python-derived-token? v)  boolean?

  v : any/c
Recognizes derived Python tokens.

procedure

(python-derived-token-tags token)  (listof symbol?)

  token : python-derived-token?
Returns the derived-token tags for token.

procedure

(python-derived-token-has-tag? token tag)  boolean?

  token : python-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(python-derived-token-text token)  string?

  token : python-derived-token?
Returns the exact source text covered by token.

procedure

(python-derived-token-start token)  position?

  token : python-derived-token?
Returns the starting source position of token.

procedure

(python-derived-token-end token)  position?

  token : python-derived-token?
Returns the ending source position of token.

The first reusable Python-specific derived tags include:

  • 'python-comment

  • 'python-whitespace

  • 'python-newline

  • 'python-nl

  • 'python-line-join

  • 'python-keyword

  • 'python-soft-keyword

  • 'python-identifier

  • 'python-string-literal

  • 'python-bytes-literal

  • 'python-f-string-literal

  • 'python-t-string-literal

  • 'python-raw-string-literal

  • 'python-numeric-literal

  • 'python-operator

  • 'python-delimiter

  • 'python-indent

  • 'python-dedent

  • 'python-error

  • 'malformed-token

Ordinary, bytes, formatted, template, and raw-prefixed Python strings all project as 'literal, while the derived layer preserves the more specific string-literal tags above.

Malformed Python input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled python and py delegate to lexers/python. Wrapped delegated Markdown tokens preserve Python-derived tags and gain 'embedded-python.}

18 Shell🔗ℹ

 (require lexers/shell) package: lexers-lib

The projected shell API has two entry points:

The first shell implementation is a handwritten lexer for reusable shell tokenization. It currently supports Bash, Zsh, and PowerShell. The public API defaults to Bash and accepts #:shell 'bash, #:shell 'zsh, and #:shell 'powershell (with 'pwsh accepted as an alias). The derived layer also distinguishes pipelines, logical operators, redirections, and heredoc introducers instead of exposing them only through generic shell punctuation tags.

procedure

(make-shell-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions 
  #:shell shell]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
  shell : (or/c 'bash 'zsh 'powershell 'pwsh) = 'bash
Constructs a streaming shell lexer.

procedure

(shell-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions 
  #:shell shell]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
  shell : (or/c 'bash 'zsh 'powershell 'pwsh) = 'bash
Tokenizes an entire shell source string using the projected token API.

18.1 Shell Returned Tokens🔗ℹ

Common projected shell categories include:

  • 'whitespace

  • 'comment

  • 'keyword

  • 'identifier

  • 'literal

  • 'delimiter

  • 'unknown

  • 'eof

For the current shell scaffold:

  • keywords and builtins project as 'keyword

  • remaining words project as 'identifier

  • strings, variables, command substitutions, options, and numeric literals project as 'literal

  • operators and punctuation project as 'delimiter

  • malformed string or substitution input projects as 'unknown in 'coloring mode and raises in 'compiler mode

Projected and derived shell token text preserve the exact consumed source slice, including comments, whitespace, and CRLF line endings.

Examples:
> (define lexer
    (make-shell-lexer #:profile 'coloring #:shell 'bash))
> (define in
    (open-input-string "export PATH\necho $PATH\n"))
> (port-count-lines! in)
> (list (lexer in)
        (lexer in)
        (lexer in))

(list

 (position-token (token 'keyword "export") (position 1 1 0) (position 7 1 6))

 (position-token (token 'whitespace " ") (position 7 1 6) (position 8 1 7))

 (position-token

  (token 'identifier "PATH")

  (position 8 1 7)

  (position 12 1 11)))

}

procedure

(make-shell-derived-lexer [#:shell shell])

  (input-port? . -> . (or/c 'eof shell-derived-token?))
  shell : (or/c 'bash 'zsh 'powershell 'pwsh) = 'bash
Constructs a streaming shell lexer for the derived-token layer.

procedure

(shell-string->derived-tokens source 
  [#:shell shell]) 
  (listof shell-derived-token?)
  source : string?
  shell : (or/c 'bash 'zsh 'powershell 'pwsh) = 'bash
Tokenizes an entire shell source string into derived shell token values.

procedure

(shell-derived-token? v)  boolean?

  v : any/c
Recognizes derived shell token values returned by make-shell-derived-lexer and shell-string->derived-tokens.

procedure

(shell-derived-token-tags token)  (listof symbol?)

  token : shell-derived-token?
Returns the shell-specific classification tags attached to a derived shell token.

procedure

(shell-derived-token-has-tag? token tag)  boolean?

  token : shell-derived-token?
  tag : symbol?
Determines whether a derived shell token carries a given classification tag.

procedure

(shell-derived-token-text token)  string?

  token : shell-derived-token?
Returns the exact source text corresponding to a derived shell token.

procedure

(shell-derived-token-start token)  position?

  token : shell-derived-token?
Returns the starting source position for a derived shell token.

procedure

(shell-derived-token-end token)  position?

  token : shell-derived-token?
Returns the ending source position for a derived shell token.

18.2 Shell Derived Tokens🔗ℹ

The current shell scaffold may attach tags such as:

  • 'shell-keyword

  • 'shell-builtin

  • 'shell-word

  • 'shell-string-literal

  • 'shell-ansi-string-literal

  • 'shell-variable

  • 'shell-command-substitution

  • 'shell-comment

  • 'shell-option

  • 'shell-numeric-literal

  • 'shell-punctuation

  • 'shell-pipeline-operator

  • 'shell-logical-operator

  • 'shell-redirection-operator

  • 'shell-heredoc-operator

  • 'malformed-token

In Bash and Zsh, ANSI-C quoted strings such as $line\n project as 'literal while the derived layer preserves 'shell-ansi-string-literal.

Markdown fenced code blocks delegate to lexers/shell for bash, sh, shell, zsh, powershell, pwsh, and ps1 info strings. Delegated Markdown tokens keep the shell tags and gain 'embedded-shell.

Examples:
> (define derived-tokens
    (shell-string->derived-tokens "printf \"%s\\n\" $(pwd)\n# done\n"))
> (map (lambda (token)
         (list (shell-derived-token-text token)
               (shell-derived-token-tags token)))
       derived-tokens)

'(("printf" (keyword shell-builtin))

  (" " (whitespace))

  ("\"%s\\n\"" (literal shell-string-literal))

  (" " (whitespace))

  ("$(pwd)" (literal shell-command-substitution))

  ("\n" (whitespace))

  ("# done" (comment shell-comment))

  ("\n" (whitespace)))

}

value

shell-profiles : immutable-hash?

The profile defaults used by the shell lexer.

19 Rust🔗ℹ

 (require lexers/rust) package: lexers-lib

The projected Rust API has two entry points:

The first Rust implementation is a handwritten streaming lexer grounded in the Rust lexical structure reference. It covers whitespace, line and nested block comments, identifiers, raw identifiers, keywords, lifetimes, strings, raw strings, character and byte literals, numeric literals, punctuation, and delimiters.

procedure

(make-rust-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Rust lexer.

Projected Rust categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Examples:
> (define lexer
    (make-rust-lexer #:profile 'coloring))
> (define in
    (open-input-string "fn main() {\n    let r#type = 42u32;\n}\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(keyword whitespace identifier delimiter)

}

procedure

(rust-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected Rust tokens.

The derived Rust API provides reusable language-specific structure:

Constructs a streaming Rust lexer that returns derived Rust tokens.

procedure

(rust-string->derived-tokens source)

  (listof rust-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived Rust tokens.

procedure

(rust-derived-token? v)  boolean?

  v : any/c
Recognizes derived Rust tokens.

procedure

(rust-derived-token-tags token)  (listof symbol?)

  token : rust-derived-token?
Returns the derived-token tags for token.

procedure

(rust-derived-token-has-tag? token tag)  boolean?

  token : rust-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(rust-derived-token-text token)  string?

  token : rust-derived-token?
Returns the exact source text covered by token.

procedure

(rust-derived-token-start token)  position?

  token : rust-derived-token?
Returns the starting source position of token.

procedure

(rust-derived-token-end token)  position?

  token : rust-derived-token?
Returns the ending source position of token.

The first reusable Rust-specific derived tags include:

  • 'rust-comment

  • 'rust-doc-comment

  • 'rust-whitespace

  • 'rust-keyword

  • 'rust-identifier

  • 'rust-raw-identifier

  • 'rust-lifetime

  • 'rust-string-literal

  • 'rust-raw-string-literal

  • 'rust-char-literal

  • 'rust-byte-literal

  • 'rust-byte-string-literal

  • 'rust-c-string-literal

  • 'rust-numeric-literal

  • 'rust-punctuation

  • 'rust-delimiter

  • 'malformed-token

Ordinary Rust strings, byte strings, chars, bytes, and C strings validate their escape structure in the derived layer. Invalid escapes and malformed multi-character char literals remain source-faithful but are tagged with 'malformed-token.

Malformed Rust input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled rust or rs delegate to lexers/rust. Wrapped delegated Markdown tokens preserve Rust-derived tags and gain 'embedded-rust.}

value

rust-profiles : immutable-hash?

The profile defaults used by the Rust lexer.

20 Swift🔗ℹ

 (require lexers/swift) package: lexers-lib

The projected Swift API has two entry points:

The first Swift implementation is a handwritten streaming lexer grounded in Swift lexical structure. It covers whitespace, line comments, nested block comments, identifiers, keywords, attributes, pound directives, strings, numbers, operators, and delimiters.

procedure

(make-swift-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Swift lexer.

Projected Swift categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'operator, 'delimiter, and 'unknown.

Examples:
> (define lexer
    (make-swift-lexer #:profile 'coloring))
> (define in
    (open-input-string "import UIKit\n@IBOutlet weak var label: UILabel!\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(keyword whitespace identifier whitespace)

procedure

(swift-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected Swift tokens.

The derived Swift API provides reusable language-specific structure:

Constructs a streaming Swift lexer that returns derived Swift tokens.

procedure

(swift-string->derived-tokens source)

  (listof swift-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived Swift tokens.

procedure

(swift-derived-token? v)  boolean?

  v : any/c
Recognizes derived Swift tokens.

procedure

(swift-derived-token-tags token)  (listof symbol?)

  token : swift-derived-token?
Returns the derived-token tags for token.

procedure

(swift-derived-token-has-tag? token tag)  boolean?

  token : swift-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(swift-derived-token-text token)  string?

  token : swift-derived-token?
Returns the exact source text covered by token.

procedure

(swift-derived-token-start token)  position?

  token : swift-derived-token?
Returns the starting source position of token.

procedure

(swift-derived-token-end token)  position?

  token : swift-derived-token?
Returns the ending source position of token.

The first reusable Swift-specific derived tags include:

  • 'swift-comment

  • 'swift-whitespace

  • 'swift-keyword

  • 'swift-identifier

  • 'swift-string-literal

  • 'swift-raw-string-literal

  • 'swift-numeric-literal

  • 'swift-attribute

  • 'swift-pound-directive

  • 'swift-operator

  • 'swift-delimiter

  • 'swift-error

  • 'malformed-token

Both ordinary Swift strings and raw strings with # delimiters project as 'literal, while the derived layer preserves 'swift-raw-string-literal for the raw forms.

Malformed Swift input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled swift delegate to lexers/swift. Wrapped delegated Markdown tokens preserve Swift-derived tags and gain 'embedded-swift.}

21 TeX🔗ℹ

 (require lexers/tex) package: lexers-lib

The projected TeX API has two entry points:

The first TeX implementation is a handwritten streaming lexer grounded in TeX’s tokenization model, but it intentionally stays within a practical static subset. It covers comments, whitespace, control words, control symbols, group and optional delimiters, math shifts, parameter markers, and plain text runs. The derived layer distinguishes inline-vs-display math shifts and gives reusable tags to the common special characters &, _, ^, and ~. It also gives reusable tags to common control-symbol spacing commands such as \ , \,, \;, \!, and \/. Common accent control symbols such as \ and \" also receive their own reusable tag, and group/optional delimiters distinguish opening-vs-closing roles. The plain-TeX paragraph command \par also receives its own reusable tag.

procedure

(make-tex-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming TeX lexer.

Projected TeX categories include 'comment, 'whitespace, 'identifier, 'literal, 'delimiter, and 'unknown.

Examples:
> (define lexer
    (make-tex-lexer #:profile 'coloring))
> (define in
    (open-input-string "\\section{Hi}\n$x+y$\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(identifier delimiter literal delimiter)

procedure

(tex-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected TeX tokens.

The derived TeX API provides reusable language-specific structure:

procedure

(make-tex-derived-lexer)

  (input-port? . -> . (or/c tex-derived-token? 'eof))
Constructs a streaming TeX lexer that returns derived TeX tokens.

procedure

(tex-string->derived-tokens source)

  (listof tex-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived TeX tokens.

procedure

(tex-derived-token? v)  boolean?

  v : any/c
Recognizes derived TeX tokens.

procedure

(tex-derived-token-tags token)  (listof symbol?)

  token : tex-derived-token?
Returns the derived-token tags for token.

procedure

(tex-derived-token-has-tag? token tag)  boolean?

  token : tex-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(tex-derived-token-text token)  string?

  token : tex-derived-token?
Returns the exact source text covered by token.

procedure

(tex-derived-token-start token)  position?

  token : tex-derived-token?
Returns the starting source position of token.

procedure

(tex-derived-token-end token)  position?

  token : tex-derived-token?
Returns the ending source position of token.

The first reusable TeX-specific derived tags include:

  • 'tex-comment

  • 'tex-whitespace

  • 'tex-control-word

  • 'tex-control-symbol

  • 'tex-paragraph-command

  • 'tex-parameter

  • 'tex-parameter-reference

  • 'tex-parameter-escape

  • 'tex-parameter-marker

  • 'tex-text

  • 'tex-math-shift

  • 'tex-inline-math-shift

  • 'tex-display-math-shift

  • 'tex-group-delimiter

  • 'tex-open-group-delimiter

  • 'tex-close-group-delimiter

  • 'tex-optional-delimiter

  • 'tex-open-optional-delimiter

  • 'tex-close-optional-delimiter

  • 'tex-special-character

  • 'tex-alignment-tab

  • 'tex-subscript-mark

  • 'tex-superscript-mark

  • 'tex-unbreakable-space

  • 'tex-control-space

  • 'tex-accent-command

  • 'tex-spacing-command

  • 'tex-italic-correction

  • 'malformed-token

Markdown fenced code blocks labeled tex delegate to lexers/tex. Wrapped delegated Markdown tokens preserve TeX-derived tags and gain 'embedded-tex.}

value

tex-profiles : immutable-hash?

The profile defaults used by the TeX lexer.

22 LaTeX🔗ℹ

 (require lexers/latex) package: lexers-lib

The projected LaTeX API has two entry points:

The first LaTeX implementation builds on the TeX lexer and adds a lightweight classification layer for common LaTeX commands such as \section, \begin, and \end. Environment names in forms such as \begin\{itemize\} receive their own derived tag, and \verb|...| spans receive a dedicated verbatim-literal tag. The common LaTeX line-break command \\ also receives its own derived tag.

procedure

(make-latex-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming LaTeX lexer.

Projected LaTeX categories include 'comment, 'whitespace, 'keyword, 'identifier, 'literal, 'delimiter, and 'unknown.

procedure

(latex-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected LaTeX tokens.

The derived LaTeX API reuses the TeX token representation and adds LaTeX tags where applicable:

Constructs a streaming LaTeX lexer that returns derived LaTeX tokens.

procedure

(latex-string->derived-tokens source)

  (listof latex-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived LaTeX tokens.

procedure

(latex-derived-token? v)  boolean?

  v : any/c
Recognizes derived LaTeX tokens.

procedure

(latex-derived-token-tags token)  (listof symbol?)

  token : latex-derived-token?
Returns the derived-token tags for token.

procedure

(latex-derived-token-has-tag? token tag)  boolean?

  token : latex-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(latex-derived-token-text token)  string?

  token : latex-derived-token?
Returns the exact source text covered by token.

procedure

(latex-derived-token-start token)  position?

  token : latex-derived-token?
Returns the starting source position of token.

procedure

(latex-derived-token-end token)  position?

  token : latex-derived-token?
Returns the ending source position of token.

Common additional LaTeX-oriented derived tags include:

  • 'latex-command

  • 'latex-environment-command

  • 'latex-environment-name

  • 'latex-verbatim-literal

  • 'latex-line-break-command

Markdown fenced code blocks labeled latex delegate to lexers/latex. Wrapped delegated Markdown tokens preserve LaTeX-derived tags and gain 'embedded-latex.}

value

latex-profiles : immutable-hash?

The profile defaults used by the LaTeX lexer.

23 TSV🔗ℹ

 (require lexers/tsv) package: lexers-lib

The projected TSV API has two entry points:

The first TSV implementation is a handwritten streaming lexer for tab-separated text. It preserves exact source text, including literal tab separators, empty fields, and CRLF row separators.

procedure

(make-tsv-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming TSV lexer.

Projected TSV categories include 'literal, 'delimiter, and 'unknown.

Field contents project as 'literal. Field separators and row separators project as 'delimiter.

Examples:
> (define lexer
    (make-tsv-lexer #:profile 'coloring))
> (define in
    (open-input-string "name\tage\nAda\t37\n"))
> (port-count-lines! in)
> (list (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in))
        (lexer-token-name (lexer in)))

'(literal delimiter literal delimiter)

procedure

(tsv-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes all of source eagerly and returns projected TSV tokens.

The derived TSV API provides reusable structure for delimited text:

procedure

(make-tsv-derived-lexer)

  (input-port? . -> . (or/c tsv-derived-token? 'eof))
Constructs a streaming TSV lexer that returns derived TSV tokens.

procedure

(tsv-string->derived-tokens source)

  (listof tsv-derived-token?)
  source : string?
Tokenizes all of source eagerly and returns derived TSV tokens.

procedure

(tsv-derived-token? v)  boolean?

  v : any/c
Recognizes derived TSV tokens.

procedure

(tsv-derived-token-tags token)  (listof symbol?)

  token : tsv-derived-token?
Returns the derived-token tags for token.

procedure

(tsv-derived-token-has-tag? token tag)  boolean?

  token : tsv-derived-token?
  tag : symbol?
Determines whether token carries tag.

procedure

(tsv-derived-token-text token)  string?

  token : tsv-derived-token?
Returns the exact source text covered by token.

procedure

(tsv-derived-token-start token)  position?

  token : tsv-derived-token?
Returns the starting source position of token.

procedure

(tsv-derived-token-end token)  position?

  token : tsv-derived-token?
Returns the ending source position of token.

The first reusable TSV-specific derived tags include:

  • 'delimited-field

  • 'delimited-quoted-field

  • 'delimited-unquoted-field

  • 'delimited-empty-field

  • 'delimited-separator

  • 'delimited-row-separator

  • 'delimited-error

  • 'tsv-field

  • 'tsv-separator

  • 'tsv-row-separator

  • 'malformed-token

Malformed TSV input is handled using the shared profile rules:

  • In the 'coloring profile, malformed input projects as 'unknown.

  • In the 'compiler profile, malformed input raises a read exception.

Markdown fenced code blocks labeled tsv delegate to lexers/tsv. Wrapped delegated Markdown tokens preserve TSV-derived tags and gain 'embedded-tsv.}

value

tsv-profiles : immutable-hash?

The profile defaults used by the TSV lexer.

24 WAT🔗ℹ

 (require lexers/wat) package: lexers-lib

The projected WAT API has two entry points:

The first WAT implementation is a handwritten lexer for WebAssembly text format. It targets WAT only, not binary .wasm files.

procedure

(make-wat-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming WAT lexer.

The result is a procedure of one argument, an input port. Each call reads the next projected WAT token from the port and returns one projected token value.

When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.

The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.

The streaming port readers emit tokens incrementally. They do not buffer the entire remaining input before producing the first token.

Examples:
> (define lexer
    (make-wat-lexer #:profile 'coloring))
> (define in
    (open-input-string "(module (func (result i32) (i32.const 42)))"))
> (port-count-lines! in)
> (list (lexer in)
        (lexer in)
        (lexer in)
        (lexer in))

(list

 (position-token (token 'delimiter "(") (position 1 1 0) (position 2 1 1))

 (position-token (token 'keyword "module") (position 2 1 1) (position 8 1 7))

 (position-token (token 'whitespace " ") (position 8 1 7) (position 9 1 8))

 (position-token (token 'delimiter "(") (position 9 1 8) (position 10 1 9)))

procedure

(wat-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes an entire WAT string using the projected token API.

This is a convenience wrapper over make-wat-lexer.

24.1 WAT Returned Tokens🔗ℹ

Common projected WAT categories include:

  • 'whitespace

  • 'comment

  • 'identifier

  • 'keyword

  • 'literal

  • 'delimiter

  • 'unknown

  • 'eof

For the current WAT scaffold:

  • form names, type names, and instruction names project as 'keyword

  • $-prefixed names and remaining word-like names project as 'identifier

  • strings and numeric literals project as 'literal

  • parentheses project as 'delimiter

  • comments project as 'comment

  • malformed input projects as 'unknown in 'coloring mode and raises in 'compiler mode

Projected and derived token text preserve the exact source slice, including whitespace and comments.

Examples:
> (define inspect-lexer
    (make-wat-lexer #:profile 'coloring))
> (define inspect-in
    (open-input-string ";; line comment\n(module (func (result i32) (i32.const 42)))"))
> (port-count-lines! inspect-in)
> (define first-token
    (inspect-lexer inspect-in))
> (lexer-token-has-positions? first-token)

#t

> (lexer-token-name first-token)

'comment

> (lexer-token-value first-token)

";; line comment"

> (position-offset (lexer-token-start first-token))

1

> (position-offset (lexer-token-end first-token))

16

}

procedure

(make-wat-derived-lexer)

  (input-port? . -> . (or/c 'eof wat-derived-token?))
Constructs a streaming WAT lexer for the derived-token layer.

procedure

(wat-string->derived-tokens source)

  (listof wat-derived-token?)
  source : string?
Tokenizes an entire WAT string into derived WAT token values.

procedure

(wat-derived-token? v)  boolean?

  v : any/c
Recognizes derived WAT token values returned by make-wat-derived-lexer and wat-string->derived-tokens.

procedure

(wat-derived-token-tags token)  (listof symbol?)

  token : wat-derived-token?
Returns the WAT-specific classification tags attached to a derived WAT token.

procedure

(wat-derived-token-has-tag? token tag)  boolean?

  token : wat-derived-token?
  tag : symbol?
Determines whether a derived WAT token carries a given classification tag.

procedure

(wat-derived-token-text token)  string?

  token : wat-derived-token?
Returns the exact source text corresponding to a derived WAT token.

procedure

(wat-derived-token-start token)  position?

  token : wat-derived-token?
Returns the starting source position for a derived WAT token.

procedure

(wat-derived-token-end token)  position?

  token : wat-derived-token?
Returns the ending source position for a derived WAT token.

24.2 WAT Derived Tokens🔗ℹ

The current WAT scaffold may attach tags such as:

  • 'wat-form

  • 'wat-type

  • 'wat-instruction

  • 'wat-identifier

  • 'wat-string-literal

  • 'wat-numeric-literal

  • 'comment

  • 'whitespace

  • 'malformed-token

Examples:
> (define derived-tokens
    (wat-string->derived-tokens
     "(module (func $answer (result i32) i32.const 42))"))
> (map (lambda (token)
         (list (wat-derived-token-text token)
               (wat-derived-token-tags token)))
       derived-tokens)

'(("(" (delimiter))

  ("module" (keyword wat-form))

  (" " (whitespace))

  ("(" (delimiter))

  ("func" (keyword wat-form))

  (" " (whitespace))

  ("$answer" (identifier wat-identifier))

  (" " (whitespace))

  ("(" (delimiter))

  ("result" (keyword wat-form))

  (" " (whitespace))

  ("i32" (keyword wat-type))

  (")" (delimiter))

  (" " (whitespace))

  ("i32.const" (keyword wat-instruction))

  (" " (whitespace))

  ("42" (literal wat-numeric-literal))

  (")" (delimiter))

  (")" (delimiter)))

}

value

wat-profiles : immutable-hash?

The profile defaults used by the WAT lexer.

25 Racket🔗ℹ

 (require lexers/racket) package: lexers-lib

The projected Racket API has two entry points:

This lexer is adapter-backed. It uses the lexer from syntax-color/racket-lexer as its raw engine and adapts that output into the public lexers projected and derived APIs.

When a source starts with "#lang at-exp", the adapter switches to the Scribble lexer family in Racket mode so that @litchar["@"] forms are tokenized as Scribble escapes instead of ordinary symbol text.

procedure

(make-racket-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Racket lexer.

The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.

When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.

The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.

Examples:
> (define lexer
    (make-racket-lexer #:profile 'coloring))
> (define in
    (open-input-string "#:x \"hi\""))
> (port-count-lines! in)
> (list (lexer in)
        (lexer in)
        (lexer in))

(list

 (position-token (token 'literal "#:x") (position 1 1 0) (position 4 1 3))

 (position-token (token 'whitespace " ") (position 4 1 3) (position 5 1 4))

 (position-token (token 'literal "\"hi\"") (position 5 1 4) (position 9 1 8)))

procedure

(racket-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes an entire Racket string using the projected token API.

This is a convenience wrapper over make-racket-lexer.

25.1 Racket Returned Tokens🔗ℹ

Common projected Racket categories include:

  • 'whitespace

  • 'comment

  • 'identifier

  • 'literal

  • 'delimiter

  • 'unknown

  • 'eof

For the current adapter:

  • comments and sexp comments project as 'comment

  • whitespace projects as 'whitespace

  • strings, constants, and hash-colon keywords project as 'literal

  • symbols, other, and no-color tokens project as 'identifier

  • parentheses project as 'delimiter

  • lexical errors project as 'unknown in 'coloring mode and raise in 'compiler mode

Projected and derived Racket token text preserve the exact consumed source slice, including multi-semicolon comment headers such as ;;;.

Examples:
> (define inspect-lexer
    (make-racket-lexer #:profile 'coloring))
> (define inspect-in
    (open-input-string "#;(+ 1 2) #:x"))
> (port-count-lines! inspect-in)
> (define first-token
    (inspect-lexer inspect-in))
> (lexer-token-has-positions? first-token)

#t

> (lexer-token-name first-token)

'comment

> (lexer-token-value first-token)

"#;"

}

Constructs a streaming Racket lexer for the derived-token layer.

procedure

(racket-string->derived-tokens source)

  (listof racket-derived-token?)
  source : string?
Tokenizes an entire Racket string into derived Racket token values.

procedure

(racket-derived-token? v)  boolean?

  v : any/c
Recognizes derived Racket token values returned by make-racket-derived-lexer and racket-string->derived-tokens.

procedure

(racket-derived-token-tags token)  (listof symbol?)

  token : racket-derived-token?
Returns the Racket-specific classification tags attached to a derived Racket token.

procedure

(racket-derived-token-has-tag? token tag)  boolean?

  token : racket-derived-token?
  tag : symbol?
Determines whether a derived Racket token carries a given classification tag.

procedure

(racket-derived-token-text token)  string?

  token : racket-derived-token?
Returns the exact source text corresponding to a derived Racket token.

procedure

(racket-derived-token-start token)  position?

  token : racket-derived-token?
Returns the starting source position for a derived Racket token.

procedure

(racket-derived-token-end token)  position?

  token : racket-derived-token?
Returns the ending source position for a derived Racket token.

25.2 Racket Derived Tokens🔗ℹ

The current Racket adapter may attach tags such as:

  • 'racket-comment

  • 'racket-sexp-comment

  • 'racket-whitespace

  • 'racket-constant

  • 'racket-string

  • 'racket-symbol

  • 'racket-parenthesis

  • 'racket-hash-colon-keyword

  • 'racket-commented-out

  • 'racket-datum

  • 'racket-open

  • 'racket-close

  • 'racket-continue

  • 'racket-usual-special-form

  • 'racket-definition-form

  • 'racket-binding-form

  • 'racket-conditional-form

  • 'racket-error

  • 'scribble-text for "#lang at-exp" text regions

  • 'scribble-command-char for @litchar["@"] in "#lang at-exp" sources

  • 'scribble-command for command names such as @litchar["@"]bold in "#lang at-exp" sources

  • 'scribble-body-delimiter

  • 'scribble-optional-delimiter

  • 'scribble-racket-escape

The ‘usual special form‘ tags are heuristic. They are meant to help ordinary Racket tooling recognize common built-in forms such as define, define-values, if, and let, but they are not guarantees about expanded meaning. In particular, a token whose text is "define" may still receive 'racket-usual-special-form even in a program where define has been rebound, because the lexer does not perform expansion or binding resolution.

Examples:
> (define derived-tokens
    (racket-string->derived-tokens "#;(+ 1 2) #:x \"hi\""))
> (map (lambda (token)
         (list (racket-derived-token-text token)
               (racket-derived-token-tags token)))
       derived-tokens)

'(("#;" (comment racket-sexp-comment racket-continue))

  ("(" (delimiter racket-parenthesis racket-open comment racket-commented-out))

  ("+" (identifier racket-symbol racket-datum comment racket-commented-out))

  (" "

   (whitespace racket-whitespace racket-continue comment racket-commented-out))

  ("1" (literal racket-constant racket-datum comment racket-commented-out))

  (" "

   (whitespace racket-whitespace racket-continue comment racket-commented-out))

  ("2" (literal racket-constant racket-datum comment racket-commented-out))

  (")"

   (delimiter racket-parenthesis racket-close comment racket-commented-out))

  (" " (whitespace racket-whitespace racket-continue))

  ("#:x" (literal racket-hash-colon-keyword racket-datum))

  (" " (whitespace racket-whitespace racket-continue))

  ("\"hi\"" (literal racket-string racket-datum)))

> (define at-exp-derived-tokens
    (racket-string->derived-tokens "#lang at-exp racket\n(define x @bold{hi})\n"))
> (map (lambda (token)
         (list (racket-derived-token-text token)
               (racket-derived-token-tags token)))
       at-exp-derived-tokens)

'(("#lang at-exp" (identifier racket-other racket-datum))

  (" " (whitespace racket-whitespace racket-continue))

  ("racket" (identifier racket-symbol racket-datum))

  ("\n" (whitespace racket-whitespace racket-continue))

  ("(" (delimiter racket-parenthesis racket-open))

  ("define"

   (identifier

    racket-symbol

    racket-datum

    racket-usual-special-form

    racket-definition-form))

  (" " (whitespace racket-whitespace racket-continue))

  ("x" (identifier racket-symbol racket-datum))

  (" " (whitespace racket-whitespace racket-continue))

  ("@" (delimiter racket-parenthesis racket-datum scribble-command-char))

  ("bold" (identifier racket-symbol racket-datum scribble-command))

  ("{" (delimiter racket-parenthesis racket-open scribble-body-delimiter))

  ("hi" (literal scribble-text racket-continue))

  ("}" (delimiter racket-parenthesis racket-close scribble-body-delimiter))

  (")" (delimiter racket-parenthesis racket-close))

  ("\n" (whitespace racket-whitespace racket-continue)))

}

value

racket-profiles : immutable-hash?

The profile defaults used by the Racket lexer.

26 Rhombus🔗ℹ

 (require lexers/rhombus) package: lexers-lib

The projected Rhombus API has two entry points:

This lexer is adapter-backed. It uses the lexer from rhombus/private/syntax-color as its raw engine and adapts that output into the public lexers projected and derived APIs.

Rhombus support is optional. When rhombus/private/syntax-color is not available, the module still loads, but calling the Rhombus lexer raises an error explaining that Rhombus support requires rhombus-lib on base >= 8.14.

procedure

(make-rhombus-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Rhombus lexer.

procedure

(rhombus-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes an entire Rhombus string using the projected token API.

26.1 Rhombus Returned Tokens🔗ℹ

Common projected Rhombus categories include:

  • 'whitespace

  • 'comment

  • 'identifier

  • 'keyword

  • 'literal

  • 'operator

  • 'delimiter

  • 'unknown

  • 'eof

For the current adapter:

  • ordinary whitespace projects as 'whitespace

  • line comments project as 'comment

  • Rhombus keywords and builtins project as 'keyword

  • remaining identifiers project as 'identifier

  • literals project as 'literal

  • openers, closers, and separators project as 'delimiter

  • operators such as +, :, and , project as 'operator

  • recoverable malformed input projects as 'unknown in 'coloring mode and raises in 'compiler mode

Projected and derived Rhombus token text preserve the exact consumed source slice, including CRLF line endings when Rhombus support is available.

Constructs a streaming Rhombus lexer for the derived-token layer.

procedure

(rhombus-string->derived-tokens source)

  (listof rhombus-derived-token?)
  source : string?
Tokenizes an entire Rhombus string into derived Rhombus token values.

procedure

(rhombus-derived-token? v)  boolean?

  v : any/c
Recognizes derived Rhombus token values returned by make-rhombus-derived-lexer and rhombus-string->derived-tokens.

procedure

(rhombus-derived-token-tags token)  (listof symbol?)

  token : rhombus-derived-token?
Returns the Rhombus-specific classification tags attached to a derived Rhombus token.

procedure

(rhombus-derived-token-has-tag? token tag)  boolean?

  token : rhombus-derived-token?
  tag : symbol?
Determines whether a derived Rhombus token carries a given classification tag.

procedure

(rhombus-derived-token-text token)  string?

  token : rhombus-derived-token?
Returns the exact source text corresponding to a derived Rhombus token.

procedure

(rhombus-derived-token-start token)  position?

  token : rhombus-derived-token?
Returns the starting source position for a derived Rhombus token.

procedure

(rhombus-derived-token-end token)  position?

  token : rhombus-derived-token?
Returns the ending source position for a derived Rhombus token.

26.2 Rhombus Derived Tokens🔗ℹ

The current Rhombus adapter may attach tags such as:

  • 'rhombus-comment

  • 'rhombus-whitespace

  • 'rhombus-string

  • 'rhombus-constant

  • 'rhombus-literal

  • 'rhombus-identifier

  • 'rhombus-keyword

  • 'rhombus-builtin

  • 'rhombus-operator

  • 'rhombus-block-operator

  • 'rhombus-comma-operator

  • 'rhombus-opener

  • 'rhombus-closer

  • 'rhombus-parenthesis

  • 'rhombus-separator

  • 'rhombus-at

  • 'rhombus-fail

  • 'rhombus-error

  • 'malformed-token

The adapter preserves Rhombus-specific keyword and builtin guesses from rhombus/private/syntax-color. Since the shared projected stream does not have a separate builtin category, builtins currently project as 'keyword, while the derived-token layer keeps the more specific 'rhombus-builtin tag.

value

rhombus-profiles : immutable-hash?

The profile defaults used by the Rhombus lexer.

27 Scribble🔗ℹ

 (require lexers/scribble) package: lexers-lib

The projected Scribble API has two entry points:

This lexer is adapter-backed. It uses syntax-color/scribble-lexer as its raw engine and adapts that output into the public lexers projected and derived APIs.

The first implementation defaults to Scribble’s inside/text mode via make-scribble-inside-lexer. Command-character customization is intentionally deferred.

procedure

(make-scribble-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Constructs a streaming Scribble lexer.

The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.

When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.

The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.

Examples:
> (define lexer
    (make-scribble-lexer #:profile 'coloring))
> (define in
    (open-input-string "@title{Hi}\nText"))
> (port-count-lines! in)
> (list (lexer in)
        (lexer in)
        (lexer in)
        (lexer in))

(list

 (position-token (token 'delimiter "@") (position 1 1 0) (position 2 1 1))

 (position-token (token 'identifier "title") (position 2 1 1) (position 7 1 6))

 (position-token (token 'delimiter "{") (position 7 1 6) (position 8 1 7))

 (position-token (token 'literal "Hi") (position 8 1 7) (position 10 1 9)))

procedure

(scribble-string->tokens source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
Tokenizes an entire Scribble string using the projected token API.

This is a convenience wrapper over make-scribble-lexer.

27.1 Scribble Returned Tokens🔗ℹ

Common projected Scribble categories include:

  • 'whitespace

  • 'comment

  • 'identifier

  • 'literal

  • 'delimiter

  • 'unknown

  • 'eof

For the current adapter:

  • text, strings, and constants project as 'literal

  • whitespace projects as 'whitespace

  • symbol and other tokens project as 'identifier

  • parentheses, the command character, and body or optional delimiters project as 'delimiter

  • lexical errors project as 'unknown in 'coloring mode and raise in 'compiler mode

For source fidelity, the Scribble adapter preserves the exact source slice for projected and derived token text, including whitespace spans that contain one or more newlines.

Examples:
> (define inspect-lexer
    (make-scribble-lexer #:profile 'coloring))
> (define inspect-in
    (open-input-string "@title{Hi}"))
> (port-count-lines! inspect-in)
> (define first-token
    (inspect-lexer inspect-in))
> (lexer-token-has-positions? first-token)

#t

> (lexer-token-name first-token)

'delimiter

> (lexer-token-value first-token)

"@"

}

Constructs a streaming Scribble lexer for the derived-token layer.

procedure

(scribble-string->derived-tokens source)

  (listof scribble-derived-token?)
  source : string?
Tokenizes an entire Scribble string into derived Scribble token values.

procedure

(scribble-derived-token? v)  boolean?

  v : any/c
Recognizes derived Scribble token values returned by make-scribble-derived-lexer and scribble-string->derived-tokens.

procedure

(scribble-derived-token-tags token)  (listof symbol?)

  token : scribble-derived-token?
Returns the Scribble-specific classification tags attached to a derived Scribble token.

procedure

(scribble-derived-token-has-tag? token tag)  boolean?

  token : scribble-derived-token?
  tag : symbol?
Determines whether a derived Scribble token carries a given classification tag.

procedure

(scribble-derived-token-text token)  string?

  token : scribble-derived-token?
Returns the exact source text corresponding to a derived Scribble token.

Returns the starting source position for a derived Scribble token.

procedure

(scribble-derived-token-end token)  position?

  token : scribble-derived-token?
Returns the ending source position for a derived Scribble token.

27.2 Scribble Derived Tokens🔗ℹ

The current Scribble adapter may attach tags such as:

  • 'scribble-comment

  • 'scribble-whitespace

  • 'scribble-text

  • 'scribble-string

  • 'scribble-constant

  • 'scribble-symbol

  • 'scribble-parenthesis

  • 'scribble-other

  • 'scribble-error

  • 'scribble-command

  • 'scribble-command-char

  • 'scribble-body-delimiter

  • 'scribble-optional-delimiter

  • 'scribble-racket-escape

These tags describe reusable Scribble structure, not presentation. In particular, 'scribble-command only means that a symbol-like token is being used as a command name after "@". It does not mean the lexer has inferred higher-level document semantics for commands such as title or itemlist.

Examples:
> (define derived-tokens
    (scribble-string->derived-tokens
     "@title{Hi}\n@racket[(define x 1)]"))
> (map (lambda (token)
         (list (scribble-derived-token-text token)
               (scribble-derived-token-tags token)))
       derived-tokens)

'(("@" (delimiter scribble-parenthesis scribble-command-char))

  ("title" (identifier scribble-symbol scribble-command))

  ("{" (delimiter scribble-parenthesis scribble-body-delimiter))

  ("Hi" (literal scribble-text))

  ("}" (delimiter scribble-parenthesis scribble-body-delimiter))

  ("\n" (whitespace scribble-whitespace))

  ("@" (delimiter scribble-parenthesis scribble-command-char))

  ("racket" (identifier scribble-symbol scribble-command))

  ("[" (delimiter scribble-parenthesis scribble-optional-delimiter))

  ("(" (delimiter scribble-parenthesis scribble-racket-escape))

  ("define" (scribble-racket-escape))

  (" " (whitespace scribble-whitespace scribble-racket-escape))

  ("x" (identifier scribble-symbol scribble-racket-escape))

  (" " (whitespace scribble-whitespace scribble-racket-escape))

  ("1" (literal scribble-constant scribble-racket-escape))

  (")" (delimiter scribble-parenthesis scribble-racket-escape))

  ("]"

   (delimiter

    scribble-parenthesis

    scribble-optional-delimiter

    scribble-racket-escape)))

}

value

scribble-profiles : immutable-hash?

The profile defaults used by the Scribble lexer.

28 JavaScript🔗ℹ

 (require lexers/javascript) package: lexers-lib

The projected JavaScript API has two entry points:

procedure

(make-javascript-lexer [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions 
  #:jsx? jsx?]) 
  (input-port? . -> . (or/c symbol? token? position-token?))
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
  jsx? : boolean? = #f
Constructs a streaming JavaScript lexer.

The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.

When #:source-positions is true, each result is a position-token? whose payload is either a bare symbol such as 'eof or a token? carrying a projected category such as 'keyword, 'identifier, 'literal, 'operator, 'comment, or 'unknown.

When #:source-positions is false, the result is either a bare symbol or a token? directly.

The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.

When #:jsx? is true, the lexer accepts a small JSX extension inside JavaScript expressions. The projected token categories remain the same, while the derived-token API exposes JSX-specific structure.

The current JavaScript slice includes broader modern numeric literals, such as hexadecimal, binary, and octal prefixed integers, decimal exponents, numeric separators, and integer BigInt suffixes.

Examples:
> (define lexer
    (make-javascript-lexer #:profile 'coloring))
> (define in
    (open-input-string "const x = 1;"))
> (port-count-lines! in)
> (list (lexer in)
        (lexer in)
        (lexer in)
        (lexer in))

(list

 (position-token (token 'keyword "const") (position 1 1 0) (position 6 1 5))

 (position-token (token 'whitespace " ") (position 6 1 5) (position 7 1 6))

 (position-token (token 'identifier "x") (position 7 1 6) (position 8 1 7))

 (position-token (token 'whitespace " ") (position 8 1 7) (position 9 1 8)))

procedure

(javascript-string->tokens 
  source 
  [#:profile profile 
  #:trivia trivia 
  #:source-positions source-positions 
  #:jsx? jsx?]) 
  (listof (or/c symbol? token? position-token?))
  source : string?
  profile : (or/c 'coloring 'compiler) = 'coloring
  trivia : (or/c 'profile-default 'keep 'skip)
   = 'profile-default
  source-positions : (or/c 'profile-default boolean?)
   = 'profile-default
  jsx? : boolean? = #f
Tokenizes an entire JavaScript string using the projected token API.

This is a convenience wrapper over make-javascript-lexer. It opens a string port, enables line counting, repeatedly calls the port-based lexer until end-of-file, and returns the resulting token list.

28.1 JavaScript Returned Tokens🔗ℹ

The projected JavaScript API uses the same output shape:

  • The end of input is reported as 'eof, either directly or inside a position-token?.

  • Ordinary results are usually token? values whose token-name is a projected category and whose token-value contains language-specific text or metadata.

  • When #:source-positions is true, each result is wrapped in a position-token?.

  • When #:source-positions is false, results are returned without that outer wrapper.

Common projected JavaScript categories include:

  • 'whitespace

  • 'comment

  • 'keyword

  • 'identifier

  • 'literal

  • 'operator

  • 'delimiter

  • 'unknown

  • 'eof

In 'coloring mode, whitespace and comments are kept, and recoverable malformed input is returned as 'unknown. In 'compiler mode, whitespace and comments are skipped by default, and malformed input raises an exception instead of producing an 'unknown token.

For the current JavaScript scaffold, token-value also preserves the original source text of the emitted token. In particular:

  • For 'keyword and 'identifier, the value is the matched identifier text, such as "const" or "name".

  • For 'literal, the value is the matched literal text, such as "1", "1_000", "123n", or "\"hello\"".

  • For 'comment and 'whitespace, the value is the original comment or whitespace text when those categories are kept.

  • For 'operator and 'delimiter, the value is the matched character text, such as "=", ";", or "(".

  • For 'unknown in tolerant mode, the value is the malformed input text that could not be accepted.

Examples:
> (define inspect-lexer
    (make-javascript-lexer #:profile 'coloring))
> (define inspect-in
    (open-input-string "const x = 1;"))
> (port-count-lines! inspect-in)
> (define first-token
    (inspect-lexer inspect-in))
> (lexer-token-has-positions? first-token)

#t

> (lexer-token-name first-token)

'keyword

> (lexer-token-value first-token)

"const"

> (position-offset (lexer-token-start first-token))

1

> (position-offset (lexer-token-end first-token))

6

}

procedure

(make-javascript-derived-lexer [#:jsx? jsx?])

  (input-port? . -> . (or/c 'eof javascript-derived-token?))
  jsx? : boolean? = #f
Constructs a streaming JavaScript lexer for the derived-token layer.

The result is a procedure of one argument, an input port. Each call reads the next raw JavaScript token from the port, computes its JavaScript-specific derived classifications, and returns one derived token value. At end of input, it returns 'eof.

The intended use is the same as for make-javascript-lexer: create the lexer once, then call it repeatedly on the same port until it returns 'eof.

Examples:
> (define derived-lexer
    (make-javascript-derived-lexer))
> (define derived-in
    (open-input-string "const x = 1;"))
> (port-count-lines! derived-in)
> (list (derived-lexer derived-in)
        (derived-lexer derived-in)
        (derived-lexer derived-in)
        (derived-lexer derived-in))

(list

 (javascript-derived-token

  (javascript-raw-token

   'identifier-token

   "const"

   (position 1 1 0)

   (position 6 1 5))

  '(keyword))

 (javascript-derived-token

  (javascript-raw-token

   'whitespace-token

   " "

   (position 6 1 5)

   (position 7 1 6))

  '())

 (javascript-derived-token

  (javascript-raw-token

   'identifier-token

   "x"

   (position 7 1 6)

   (position 8 1 7))

  '(identifier declaration-name))

 (javascript-derived-token

  (javascript-raw-token

   'whitespace-token

   " "

   (position 8 1 7)

   (position 9 1 8))

  '()))

procedure

(javascript-string->derived-tokens source 
  [#:jsx? jsx?]) 
  (listof javascript-derived-token?)
  source : string?
  jsx? : boolean? = #f
Tokenizes an entire JavaScript string into derived JavaScript token values.

This is a convenience wrapper over make-javascript-derived-lexer. It opens a string port, enables line counting, repeatedly calls the derived lexer until it returns 'eof, and returns the resulting list of derived tokens.

procedure

(javascript-derived-token? v)  boolean?

  v : any/c
Recognizes derived JavaScript token values returned by make-javascript-derived-lexer and javascript-string->derived-tokens.

Returns the JavaScript-specific classification tags attached to a derived JavaScript token.

procedure

(javascript-derived-token-has-tag? token    
  tag)  boolean?
  token : javascript-derived-token?
  tag : symbol?
Determines whether a derived JavaScript token carries a given classification tag.

Returns the exact source text corresponding to a derived JavaScript token.

Returns the starting source position for a derived JavaScript token.

Returns the ending source position for a derived JavaScript token.

28.2 JavaScript Derived Tokens🔗ℹ

A derived JavaScript token pairs one raw JavaScript token with a small list of JavaScript-specific classification tags. This layer is more precise than the projected consumer-facing categories and is meant for inspection, testing, and language-aware tools.

The current JavaScript scaffold may attach tags such as:

  • 'keyword

  • 'identifier

  • 'declaration-name

  • 'parameter-name

  • 'object-key

  • 'property-name

  • 'method-name

  • 'private-name

  • 'static-keyword-usage

  • 'string-literal

  • 'numeric-literal

  • 'bigint-literal

  • 'numeric-separator-literal

  • 'regex-literal

  • 'template-literal

  • 'template-chunk

  • 'template-interpolation-boundary

  • 'jsx-tag-name

  • 'jsx-closing-tag-name

  • 'jsx-attribute-name

  • 'jsx-text

  • 'jsx-interpolation-boundary

  • 'jsx-fragment-boundary

  • 'comment

  • 'malformed-token

Examples:
> (define derived-tokens
    (javascript-string->derived-tokens
     "class Box { static create() { return this.value; } #secret = 1; }\nfunction wrap(name) { return name; }\nconst item = obj.run();\nconst data = { answer: 42 };\nconst greeting = `a ${name} b`;\nreturn /ab+c/i;"))
> (map (lambda (token)
         (list (javascript-derived-token-text token)
               (javascript-derived-token-tags token)
               (javascript-derived-token-has-tag? token 'keyword)
               (javascript-derived-token-has-tag? token 'identifier)
               (javascript-derived-token-has-tag? token 'declaration-name)
               (javascript-derived-token-has-tag? token 'parameter-name)
               (javascript-derived-token-has-tag? token 'object-key)
               (javascript-derived-token-has-tag? token 'property-name)
               (javascript-derived-token-has-tag? token 'method-name)
               (javascript-derived-token-has-tag? token 'private-name)
               (javascript-derived-token-has-tag? token 'static-keyword-usage)
               (javascript-derived-token-has-tag? token 'numeric-literal)
               (javascript-derived-token-has-tag? token 'regex-literal)
               (javascript-derived-token-has-tag? token 'template-literal)
               (javascript-derived-token-has-tag? token 'template-chunk)
               (javascript-derived-token-has-tag? token 'template-interpolation-boundary)))
       derived-tokens)

'(("class" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("Box"

   (identifier declaration-name)

   #f

   (identifier declaration-name)

   (declaration-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("static"

   (keyword static-keyword-usage)

   (keyword static-keyword-usage)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (static-keyword-usage)

   #f

   #f

   #f

   #f

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("create" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)

  ("(" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (")" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("return" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("this" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("." () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("value"

   (identifier property-name)

   #f

   (identifier property-name)

   #f

   #f

   #f

   (property-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("#secret"

   (private-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (private-name)

   #f

   #f

   #f

   #f

   #f

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("1"

   (numeric-literal)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (numeric-literal)

   #f

   #f

   #f

   #f)

  (";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("function" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("wrap"

   (identifier declaration-name)

   #f

   (identifier declaration-name)

   (declaration-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  ("(" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("name"

   (identifier parameter-name)

   #f

   (identifier parameter-name)

   #f

   (parameter-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (")" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("return" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("name" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)

  (";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("const" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("item"

   (identifier declaration-name)

   #f

   (identifier declaration-name)

   (declaration-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("obj" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)

  ("." () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("run"

   (identifier method-name property-name)

   #f

   (identifier method-name property-name)

   #f

   #f

   #f

   (property-name)

   (method-name property-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  ("(" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (")" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("const" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("data"

   (identifier declaration-name)

   #f

   (identifier declaration-name)

   (declaration-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("answer"

   (identifier object-key)

   #f

   (identifier object-key)

   #f

   #f

   (object-key)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (":" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("42"

   (numeric-literal)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (numeric-literal)

   #f

   #f

   #f

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("const" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("greeting"

   (identifier declaration-name)

   #f

   (identifier declaration-name)

   (declaration-name)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("`"

   (template-literal)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (template-literal)

   #f

   #f)

  ("a "

   (template-literal template-chunk)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (template-literal template-chunk)

   (template-chunk)

   #f)

  ("${"

   (template-literal template-interpolation-boundary)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (template-literal template-interpolation-boundary)

   #f

   (template-interpolation-boundary))

  ("name" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)

  ("}"

   (template-literal template-interpolation-boundary)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (template-literal template-interpolation-boundary)

   #f

   (template-interpolation-boundary))

  (" b"

   (template-literal template-chunk)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (template-literal template-chunk)

   (template-chunk)

   #f)

  ("`"

   (template-literal)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (template-literal)

   #f

   #f)

  (";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("return" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

  ("/ab+c/i"

   (regex-literal)

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   #f

   (regex-literal)

   #f

   #f

   #f)

  (";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f))

}

Examples:
> (define jsx-derived-tokens
    (javascript-string->derived-tokens
     "const el = <Button kind=\"primary\">Hello {name}</Button>;\nconst frag = <>ok</>;"
     #:jsx? #t))
> (map (lambda (token)
         (list (javascript-derived-token-text token)
               (javascript-derived-token-tags token)
               (javascript-derived-token-has-tag? token 'jsx-tag-name)
               (javascript-derived-token-has-tag? token 'jsx-closing-tag-name)
               (javascript-derived-token-has-tag? token 'jsx-attribute-name)
               (javascript-derived-token-has-tag? token 'jsx-text)
               (javascript-derived-token-has-tag? token 'jsx-interpolation-boundary)
               (javascript-derived-token-has-tag? token 'jsx-fragment-boundary)))
       jsx-derived-tokens)

'(("const" (keyword) #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f)

  ("el" (identifier declaration-name) #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f)

  ("=" () #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f)

  ("<" () #f #f #f #f #f #f)

  ("Button" (identifier jsx-tag-name) (jsx-tag-name) #f #f #f #f #f)

  (" " () #f #f #f #f #f #f)

  ("kind" (identifier jsx-attribute-name) #f #f (jsx-attribute-name) #f #f #f)

  ("=" () #f #f #f #f #f #f)

  ("\"primary\"" (string-literal) #f #f #f #f #f #f)

  (">" () #f #f #f #f #f #f)

  ("Hello " (jsx-text) #f #f #f (jsx-text) #f #f)

  ("{"

   (jsx-interpolation-boundary)

   #f

   #f

   #f

   #f

   (jsx-interpolation-boundary)

   #f)

  ("name" (identifier) #f #f #f #f #f #f)

  ("}"

   (jsx-interpolation-boundary)

   #f

   #f

   #f

   #f

   (jsx-interpolation-boundary)

   #f)

  ("</" () #f #f #f #f #f #f)

  ("Button"

   (identifier jsx-closing-tag-name)

   #f

   (jsx-closing-tag-name)

   #f

   #f

   #f

   #f)

  (">" () #f #f #f #f #f #f)

  (";" () #f #f #f #f #f #f)

  ("\n" () #f #f #f #f #f #f)

  ("const" (keyword) #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f)

  ("frag" (identifier declaration-name) #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f)

  ("=" () #f #f #f #f #f #f)

  (" " () #f #f #f #f #f #f)

  ("<>" (jsx-fragment-boundary) #f #f #f #f #f (jsx-fragment-boundary))

  ("ok" (jsx-text) #f #f #f (jsx-text) #f #f)

  ("</>" (jsx-fragment-boundary) #f #f #f #f #f (jsx-fragment-boundary))

  (";" () #f #f #f #f #f #f))

}

value

javascript-profiles : immutable-hash?

The profile defaults used by the JavaScript lexer.