Chido Parse
William Hatch <william@hatch.uno>
1 Guide
NOTE - chido-parse is NOT YET STABLE, and APIs may change.
Chido Parse is an expressive, extensible, composable parsing system. It allows users to build and compose parsers using any mix of ad-hoc procedural parsing, parser combinators, and other parsing abstractions and interfaces, such as readtables and BNF. It allows ambiguous grammars, but provides tools to filter ambiguity, including declarative operator associativity and precidence rules in readtables and BNF rules. Chido Parse supports context-sensitive grammars, including data-dependent grammars, in addition to supporting the entire class of context-free grammars.
Chido Parse is also... slow. So until and unless it can be optimized about 1 order of magnitude faster, I don’t expect people to really use it except for quick prototyping. If it gets 1 order of magnitude faster it will still be slow, but probably faster than macro expansion (at least for s-expression based languages), which will make it viable for projects where extensible and composable concrete syntax extension is as important as macro-based extension. But if you want fast compilation... look elsewhere.
You can use Chido Parse by constructing parser objects and then using whole-parse to parse your input. The simplest kind of parser to use is a literal string, and you can easily use combinators like sequence.
(define abc-d (whole-parse "abc" (sequence "a" "b" "c")))
The above code will return a derivation object. We can get a semantic result out with parse-derivation-result:
(parse-derivation-result abc-d)
Which will return '("a" "b" "c").
One of the key features of Chido Parse is its support for procedural parsers. Parsing procedures take a port and return either a derivation or a parse failure. To use them as parsers, you need to wrap them with the proc-parser struct. We can easily re-use existing parsers this way.
(proc-parser (λ (port) (make-parse-derivation (read-syntax (object-name port) port))))
Actually, procedural parsers can return a stream tree of parse derivations and failures. This allows them to support ambiguous grammars. However, result streams for Chido Parse should be constructed with stream-cons or for/parse, unless you know what you’re doing.
Procedural parsers can recur back into the parsing system by using parse*. parse* is like whole-parse, except that it doesn’t have to consume the whole input and it returns a stream of derivations. Note that whole-parse actually returns a parse failure if there is more than one derivation. If you want multiple results from a top-level parse, you can use whole-parse*. While parsers can return a tree of streams, they are flattened and parse* and whole-parse* return a stream whose members are all parse-derivation? objects.
For example:
(define plus (proc-parser (λ (port) (for/parse ([l (parse* port expression)]) (for/parse ([op (parse* port "+" #:start l)]) (for/parse ([r (parse* port expression #:start op)]) (make-parse-derivation `(+ ,(parse-derivation-result l) ,(parse-derivation-result r)) #:derivations (list l op r))))))))
Note that parse* does not actually consume input from the port, so successive parses need the #:start keyword. But there’s an easier way. Instead of parse*, you can use parse*-direct, which uses delimited continuation magic to loop over a stream while looking like straight-line code:
(define plus (proc-parser (λ (port) (define l (parse*-direct port expression)) (define op (parse*-direct port "+")) (define r (parse*-direct port expression)) (make-parse-derivation `(+ ,(parse-derivation-result l) ,(parse-derivation-result r)) #:derivations (list l op r)))))
Note that if we define the expression parser like so:
(define expression (alt-parser plus number)) (define number (kleene-plus (char-range-parser "09") #:result/bare (λ (ds) (string->number (apply string ds)))))
It turns out that our procedural plus parser is left-recursive! Everybody knows that if you make a left-recursive procedural parser you get infinite recursion! But not in Chido Parse. Chido Parse dynamically detects left-recursive dependency cycles and resolves them by capturing and descheduling first-class delimited continuations. When a cycle occurs, Chido Parse saves the continuation, finds other work, and then applies the continuation to a result when one is available from a different alternate. If there is no alternate that can make progress, Chido Parse breaks the cycle by returning a failure result to the left-recursive parser. Failing cycles this way is mostly so the parser can compute a good failure message, but conceivably you could implement a biased alternate this way. However, at present Chido Parse doesn’t guarantee any order to cycle breaking, so don’t.
Chido Parse also has BNF forms, of course:
(define-bnf dumb-statement-parser [statement ["pass"] ["for" id "in" expression "do" statement] ["{" [: #:repeat-min 0 #:splice 1 statement] "}" #:name "block"] [expression]] [expression [number-parser] [id] [expression "+" expression #:associativity 'left] [expression "*" expression #:associativity 'right #:precedence-greater-than '("+")]] ; Obviously this is a bad identifier parser, but I'm making dumb examples in a hurry. [id [[: #:repeat-min 1 (char-range "az")]]])
The define-bnf form has a bunch of optional arguments for specifying automatic layout insertion (IE allowing whitespace between things). Note that these things are all just parsers. You can use a procedural parser anywhere in your BNF definition, you can stick a BNF parser in a combinator, etc.
Of course, the real reason I started Chido Parse was because I wanted better readtables. So Chido Parse has a readtable-style abstraction for making and extending fancy s-expression parsers.
(define cool-s-exp (chido-readtable-add-list-parser "$(" ")" basic-chido-readtable #:wrap '#%dollar-paren))
Now cool-s-exp can parse
(foo bar $(quux flub) zonkers) |
as '(foo bar (#%dollar-paren quux flub) zonkers)
Chido readtables support infix/prefix/postfix operators as well as the kinds of extensions that Racket’s normal readtables support. Chido readtables support arbitrary-length prefixes unlike Racket readtables with their single character prefix..
Chido readtables and BNF parsers can be extended, eg. with extend-chido-readtable and extend-bnf. BNF parsers are actually made out of readtables, and you can do extend-readtable-as-bnf.
Parsers can be extended, modified, or loaded dynamically. Eg. you can use extend-readtable during parsing, and you can create parsers like Racket’s built-in #reader form. All these fancy parsing abstractions are built out of procedural parsers. If there is another parsing abstraction you know of and like, you can build that abstraction too!
This is clearly a bad guide, but I wanted to write at least something real quick.
TODO - write a better guide.
2 Reference
NOTE - chido-parse is NOT YET STABLE, and APIs may change.
2.1 Gotchas
Don’t use parameter?s, use chido-parse-parameter?s. Don’t use stream-cons or other stream constructors, use parse-stream-cons and its derivatives provided by Chido Parse.
2.2 Core API
2.2.1 Parsing Functions
procedure
input : (or/c string? input-port?) parser : parser? start : (or/c #f number? parse-derivation?) = #f
If this is the first call into the parsing system, the input port is actually replaced for inner parses with a wrapper that can be rewound for multiple ambiguous parses. In any event, parse* does NOT change the current position of the port. If you do a series of parses with parse*, you should pass the previous derivation as the start position to the next call of parse*. Realistically, within a parser parse* should only be used in tail position or in combination with for/parse. For a more convenient function for use within parsers, see parse*-direct.
This is technically the core parsing function, but you probably want to use whole-parse or whole-parse* for the initial parse, and parse*-direct inside parsers.
procedure
(parse input parser) → (or/c parse-derivation? parse-failure?)
input : (or/c string? input-port?) parser : parser?
procedure
(whole-parse* input parser [#:start start]) → stream?
input : (or/c string? input-port?) parser : parser? start : (or/c #f number? parse-derivation?) = #f
procedure
(whole-parse input parser)
→ (or/c parse-derivation? parse-failure?) input : (or/c string? input-port?) parser : parser?
procedure
(parse*-direct input parser [ #:start start #:failure failure-arg]) → parse-derivation? input : input-port? parser : parser? start : (or/c #f number? parse-derivation?) = #f
failure-arg : (or/c #f (-> parse-failure? parse-failure?)) = #f
(define abc-parser (proc-parser (λ (port) (define a (parse*-direct port "a")) (define b (parse*-direct port "b")) (define c (parse*-direct port "c")) (make-parse-derivation (string-append (map parse-derivation-result (list a b c))) #:derivations (list a b c)))))
However, if you are doing some OTHER delimited continuation stuff or creating a stream of results using something other than parse-stream-cons (or for/parse), you may need to take care to use delimit-parse*-direct so the overall parse result is correct.
syntax
(delimit-parse*-direct expression)
But really just don’t use something other than parse-stream-cons (and its derivatives) to construct result streams!
2.2.2 Parse Derivations
procedure
(make-parse-derivation result [ #:end end #:derivations derivations]) → parse-derivation? result : (or/c any/c (-> any/c number? number? number? number? (listof parse-derivation?) any/c)) end : (or/c #f number?) = #f derivations : (listof parse-derivation?) = '()
Result can be anything, but if it is a procedure it is assumed to be a delayed result. The procedure should be of the form:
(λ (src line col pos span derivations) (datum->syntax #f 'my-result (list src line col pos span)))
procedure
(parse-derivation? x) → bool/c
x : any/c
procedure
(parse-derivation-result pd) → any/c
pd : parse-derivation?
procedure
(parse-derivation-parser pd) → parser?
pd : parse-derivation?
procedure
(parse-derivation-source-name pd) → any/c
pd : parse-derivation?
procedure
(parse-derivation-line pd) → (or/c #f number?)
pd : parse-derivation?
procedure
(parse-derivation-column pd) → (or/c #f number?)
pd : parse-derivation?
procedure
(parse-derivation-start-position pd) → (or/c #f number?)
pd : parse-derivation?
procedure
(parse-derivation-end-position pd) → (or/c #f number?)
pd : parse-derivation?
procedure
→ (listof parse-derivation?) pd : parse-derivation?
2.2.3 Parse Failures
procedure
(make-parse-failure [ #:message message #:position position #:end end #:made-progress? made-progress? #:inner-failure inner-failure #:failures failures]) → parse-failure? message : (or/c #f string?) = #f position : (or/c number? #f) = #f end : (or/c number? #f) = #f made-progress? : any/c = progress-default inner-failure : (or/c parse-failure? #f) = #f failures : (or/c (listof parse-failure?) #f) = #f
procedure
(parse-failure? x) → bool/c
x : any/c
Note that parse failures implement the gen:stream interface as empty streams.
procedure
(exn->failure e) → parse-failure?
e : exn?
procedure
pf : parse-failure?
procedure
pf : parse-failure?
procedure
pf : parse-failure?
procedure
pf : parse-failure?
procedure
pf : parse-failure?
parameter
(chido-parse-keep-multiple-failures?) → any/c
(chido-parse-keep-multiple-failures? keep) → void? keep : any/c
= #f
2.2.4 Chido Parse Parameters
procedure
(chido-parse-parameter? x) → any/c
x : any/c
Don’t use parameter?s inside parsing procedures. Use chido-parse-parameters instead.
They behave similarly, but chido-parse-parameters are a key to the parsing cache (so you don’t accidentally pull out a cached result that was constructed by referring to a different parameter value, such as a different value for current-chido-readtable) and are preserved by parse-stream-cons, whereas regular parameters are NOT preserved by stream-cons and friends.
procedure
(chido-parse-parameter default) → chido-parse-parameter?
default : any/c
syntax
(chido-parse-parameterize ([cp-param value] ...) body ...)
2.2.5 Stream construction
syntax
(parse-stream-cons head tail)
syntax
(for/parse ([binding-id stream-expression]) body ...)
2.2.6 Parsers and core parser constructors
procedure
(parser? x) → any/c
x : any/c
BUT - here I’ll at least document what this is meant to check, and what you can expect from functions that return parser?s.
Parsers are either proc-parsers, alt-parsers, string?s, structs that implement prop:custom-parser, non-cached-parser-thunk, or a thunk that returns one of those.
Thunks are allowed because parsers often need to recursively refer to each other. If parsers are mutually recursive, you have an ordering issue where neither can be constructed without the other. To break the cycle, put one in a thunk! Thunks that contain parsers are only forced once and cached, so you can rely on the results of the thunk being eq? (eg.for filtering purposes). Because sometimes you really do want the thunk to be re-evaluated, thunks wrapped with the non-cached-parser-thunk struct are NOT cached. Additionally, chido-parse-parameter?s used as parser thunks are not cached, because their whole point is to be this dynamically extensible thing to allow you to implement parsers independently then have them dynamically reference the right parser when recurring.
If you use a procedure as a parser that is NOT a thunk that returns (perhaps eventually through multiple layers of thunks) a primitive parser, you will get an exception when the scheduler actually tries to force it and use it as a parser.
procedure
(proc-parser proc [ #:prefix prefix #:name name #:preserve-prefix? preserve-prefix? #:promise-no-left-recursion? promise-no-left-recursion?]) → proc-parser? proc : (-> input-port? (or/c any/c stream?)) prefix : string? = "" name : string? = (format "~a" (object-name proc)) preserve-prefix? : any/c = #f promise-no-left-recursion? : any/c = #f
If prefix is given, it is a literal prefix to match before the procedure is called. Unless preserve-prefix? is true, the port is set to just after the prefix, so your parsing function can focus on the interesting part of the parse. preserve-prefix? is mostly there as a convenience for wrapping existing parsing procedures.
The prefix may be queried and used in a semantically meaningful way, as by chido-readtable?s.
procedure
(proc-parser? p) → any/c
p : any/c
procedure
(alt-parser [#:name name] p ...) → alt-parser?
name : (or/c string? #f) = #f p : parser?
procedure
(alt-parser? p) → any/c
p : any/c
procedure
t : (-> parser?)
TODO - I should document this. But really it should probably be gen:custom-parser and support multiple methods, eg. for name, prefix, etc.
IE this is not stable.
At any rate, while this is undocumented you can also just use a function that takes your struct and returns a parser, though that’s less satisfying.
procedure
(parser-name p) → string?
p : parser?
Note that this has to force thunks.
procedure
(parser-prefix p) → string?
p : parser?
procedure
p : parser?
procedure
p : parser?
Note that this has to force thunks.
2.3 Literal Parsers
You can use string?s as literal parsers. You can always create proc-parsers that parse literals. The other literal parsers here are just proc-parsers that are convenient enough to be in the standard parser library.
value
value
2.4 Combinators
The alternation combinator alt-parser is documented in another section.
procedure
(sequence [ #:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:between between #:before before #:after after] parser ...) → parser? name : (or/c #f string?) = #f derive : procedure? = #f make-result/bare : procedure? = #f make-result/stx : procedure? = #f between : (or/c #f parser?) = #f before : (or/c #f parser?) = #f after : (or/c #f parser?) = #f parser : parser?
The derive, result/bare, and result/stx arguments are for building the result derivation, and only one may be given (a non-#f value). They are all procedures that must accept one value for each of parser, but not the before, between, or after parsers.
derive should be a function that accepts one derivation for each parser, and should return a parse-derivation?.
result/bare should be a function that accepts the result values (IE using parse-derivation-result) from each parser and should return a value to be used as the result field of make-parse-derivation. You can also provide #t instead of a procedure to just get the results in a list.
result/stx is like result/bare, but it wraps the result you return as a syntax object with appropriate source location info. You can also provide #t instead of a procedure to just get the results in a syntax list.
If none of the result options are given, the default is as if result/stx is #t.
TODO - example usage.
procedure
(repetition [ #:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:min min #:max max #:greedy? greedy? #:between between #:before before #:after after] parser) → parser? name : (or/c #f string?) = #f derive : procedure? = #f make-result/bare : procedure? = #f make-result/stx : procedure? = #f min : number? = 0 max : number? = +inf.0 greedy? : any/c = #f between : (or/c #f parser?) = #f before : (or/c #f parser?) = #f after : (or/c #f parser?) = #f parser : parser?
The optional argument API is like sequence. between is used between iterations of parser, before and after are parsed before/after the first/last repetition of parser.
derive, result/bare, and result/stx act like sequence but should accept a single argument which will be a list of derivations returned by parser (in order). Again, result/bare and result/stx may be #t instead of a procedure, in which case the results will be put in a list or syntax list. If none of the options are non-false, result/stx is treated as #t.
You can specify the minimum and maximum number of acceptible repetitions with min and max. If greedy? is non-false, derivations will only be returned if they parsed the max number of repetitions or no more repetitions can be parsed after the last one. Note that the result may still be ambiguous depending on whether parser or before/between/after are ambiguous. If greedy? is false, all derivations with a number of repetitions between min and max, inclusive, are returned. Frankly, for any realistic use, you want greedy? to be true, because the extra ambiguity of non-greedy makes a lot of work for the system, even though most of it is immediately thrown away.
TODO - usage example.
UNSTABLE - I might change the default option for greedy?. On the one hand, false gives the more general parser and what most people would probably expect at first. But non-greedy repetition is just so much slower that any practical use really needs to reach for greedy.
procedure
(kleene-star [ #:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:greedy? greedy? #:between between #:before before #:after after] parser) → parser? name : (or/c #f string?) = #f derive : procedure? = #f make-result/bare : procedure? = #f make-result/stx : procedure? = #f greedy? : any/c = #f between : (or/c #f parser?) = #f before : (or/c #f parser?) = #f after : (or/c #f parser?) = #f parser : parser?
procedure
(kleene-plus [ #:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:greedy? greedy? #:between between #:before before #:after after] parser) → parser? name : (or/c #f string?) = #f derive : procedure? = #f make-result/bare : procedure? = #f make-result/stx : procedure? = #f greedy? : any/c = #f between : (or/c #f parser?) = #f before : (or/c #f parser?) = #f after : (or/c #f parser?) = #f parser : parser?
procedure
(kleene-question [ #:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:greedy? greedy? #:between between #:before before #:after after] parser) → parser? name : (or/c #f string?) = #f derive : procedure? = #f make-result/bare : procedure? = #f make-result/stx : procedure? = #f greedy? : any/c = #f between : (or/c #f parser?) = #f before : (or/c #f parser?) = #f after : (or/c #f parser?) = #f parser : parser?
UNSTABLE - I might change the default option for greedy?.
procedure
(permutation [ #:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:between between #:before before #:after after] parser ...) → parser? name : (or/c #f string?) = #f derive : procedure? = #f make-result/bare : procedure? = #f make-result/stx : procedure? = #f between : (or/c #f parser?) = #f before : (or/c #f parser?) = #f after : (or/c #f parser?) = #f parser : parser?
procedure
(epsilon-parser [ #:name name #:result result]) → parser? name : string? = "epsilon" result : (or/c #f any/c) = #f
procedure
(not-parser parser [ #:name name #:result result]) → parser? parser : parser? name : string? = (format "not_~a" (parser-name parser)) result : any/c = #f
You can supply result as the result field for make-parse-derivation.
procedure
(peek-parser parser) → parser?
parser : parser?
You could use this with eg. parse*-direct to peek ahead without actually effecting the input port.
procedure
(char-range-parser min [max]) → parser?
min : (or/c char? string?) max : (or/c #f char? string?) = #f
procedure
(regexp->parser rx [#:name name]) → parser?
rx : regexp? name : (or/c #f string?) = #f
procedure
(wrap-derivation p wrap-func [#:name name]) → parser?
p : parser? wrap-func : (-> parse-derivation? parse-derivation?) name : (or/c #f string?) = #f
syntax
(binding-sequence elem ... global-option ...)
elem = parser | [: parser elem-optional ...] global-option = #:name parser-name | #:derive derive | #:result/bare result/bare | #:result/stx result/stx | #:splice splice-global | #:inherit-between inherit-between | #:between between-parser elem-optional = #:bind binding-name | #:ignore ignore | #:splice splice-number | #:repeat-min repeat-min | #:repeat-max repeat-max | #:repeat-greedy? repeat-greedy?
To bind the derivation as a result, use the #:bind keyword to bind the derivation to name. Then just refer to name later!
Global options affect the entire sequence. between-parser is inserted between each element in the sequence. If between-parser is not specified but inherit-between is truthy, then the sequence inherits the between-parser from any binding-sequence that surrounds this one. If splice-global is a number greater than 0, the resulting list is spliced that many times (which is only useful if your sequence only has a single non-ignored element). The derive, result/bare, and result/stx arguments are similar to the similar arguments in sequence, though note that splices and ignores affect the list of arguments that will be passed to them. The default derivation result will be a syntax list of the results of the elements (taking into account splices and ignores).
Each element in the sequence may have extra options about binding, splicing, or repetition. The binding-name argument is bound to the derivation of that element for future elements in the sequence. The splice-number argument should be a positive integer, and the result list of that parser will be spliced that number of times into the outer result list. If the ignore argument is true, that element is not included in the result list. The repetition arguments act as in repetition, though the between-parser is inserted between repetitions.
2.4.1 Filters
procedure
(parse-filter p filter-func [ #:replace-derivation? replace-derivation? #:include-failures? include-failures?]) → parser? p : parser? filter-func : (-> port derivation (or/c any/c parse-derivation?)) replace-derivation? : any/c = #f include-failures? : any/c = #t
The filter-func receieves the input port at the position after the parser’s derivation, as well as the derivation itself. If the filter-func returns false, that derivation is removed from the result stream of the filtered parser. If replace-derivation? is true, the filter-func’s non-false results must be new parse derivations, which replace the original derivations in the parse stream. Otherwise the original parse derivations are used even if the truthy value returned by filter-func is itself a parse derivation.
If include-failures? is true, each filtered derivation will be replaced in the stream by a parse-failure? with a message about the filtered derivation. It’s really only useful if you want to track all failures with chido-parse-keep-multiple-failures?.
procedure
(not-follow-filter main-parser not-follow-parser [ #:include-failures? include-failures?]) → parser? main-parser : parser? not-follow-parser : parser? include-failures? : any/c = #t
procedure
(must-follow-filter main-parser must-follow-parser [ #:include-failures? include-failures?]) → parser? main-parser : parser? must-follow-parser : parser? include-failures? : any/c = #t
Eg. (must-follow-filter "abc" "d") parses the first three characters of the string "abcd", but does not parse the strings "abc" or "abce".
procedure
(derivation-filter p filter-func [ #:replace-derivation? replace-derivation? #:include-failures? include-failures?]) → parser? p : parser? filter-func : (-> parse-derivation? (or/c any/c parse-derivation?)) replace-derivation? : any/c = #f include-failures? : any/c = #t
procedure
(result-filter p filter-func [ #:replace-result? replace-result? #:include-failures? include-failures?]) → parser? p : parser? filter-func : (-> any/c any/c) replace-result? : any/c = #f include-failures? : any/c = #t
2.5 Chido-Readtable Parsers
Chido-readtables have an interface similar to Rackets readtables (see Readtables in the Racket Guide)
Chido-readtables are a fancy kind of alternative parser. Among their alternatives, chido-readtables have a built-in symbol parser whose behavior is modified by the other parsers. Chido-readtables are extended with parsers and a symbol denoting an effect on the built-in symbol parser, such as 'terminating or 'nonterminating-layout.
When the built-in symbol parser is used by a chido-readtable, the symbol continues until reaching the prefix of a 'terminating or 'terminating-layout parser or a successful parse from a 'soft-terminating or 'soft-terminating-layout parser.
Layout parsers are whitespace or comment parsers that delimit symbols and are generally allowed between other parsers inside lists. Results from layout parsers are ignored when creating parse results from readtable parsers, only the fact that they succeed at parsing is relevant. They come in 'terminating-layout, 'soft-terminating-layout, and 'nonterminating-layout varieties, which match the symbol effects of 'terminating, 'soft-terminating, and 'nonterminating parsers, respectively.
Terminating parsers are parsing alternatives that also delimit symbols (IE they terminate the built-in symbol parsing). In other words no space is necessary between a symbol and a following terminating parser. 'soft-terminating parsers terminate a symbol only of they parse successfully. Hard terminating ('terminating) parsers terminate a symbol when their prefix matches, whether or not they parse successfully. Therefore the prefixes of hard terminating parsers are disallowed anywhere within symbols.
Nonterminating parsers are alternatives that do not delimit a symbol. Layout (whitespace) is required between a symbol and a following nonterminating parser.
Terminating and nonterminating parsers can follow each other with no layout in between, though layout is allowed between them, and usually good style to have.
Note that when using a chido-readtable as a parser directly, it parses only ONE form, equivalent to using chido-readtable->read1. When you want to parse a sequence using chido-readtables, instead of using a kleene star combinator, you should use chido-readtable->read*.
Ambiguous results are possible from parsing with a chido-readtable when terminating/nonterminating parsers themselves return ambiguous results or when multiple terminating/nonterminating parsers are successful at a given position. If another parser is successful, the symbol parser is not tried. The symbol parser never returns an ambiguous result.
One final parser flag is 'left-recursive-nonterminating. Parsers with the 'left-recursive-nonterminating flag are run after the symbol parser and do not effect symbol parsing. While chido-parse doesn’t normally need a flag to handle left-recursive parsers, because most readtable parsers are run before the symbol parser and used to effect the symbol parser, left-recursive parsers need a special flag in the readtable parser.
2.5.1 chido-readtable objects
TODO - list the flags that the table starts with, eg. chido-readtable-symbol-support?.
procedure
(chido-readtable? x) → any/c
x : any/c
2.5.2 Variations on chido-readtable parsers
procedure
(chido-readtable->read1 rt) → parser?
rt : chido-readtable?
procedure
rt : chido-readtable?
procedure
(chido-readtable->read* rt) → parser?
rt : chido-readtable?
procedure
(chido-readtable->read+ rt) → parser?
rt : chido-readtable?
procedure
(chido-readtable->layout1 rt) → parser?
rt : chido-readtable?
procedure
(chido-readtable->layout* rt) → parser?
rt : chido-readtable?
procedure
(chido-readtable->layout+ rt) → parser?
rt : chido-readtable?
UNSTABLE - chido readtables don’t yet support symbols with escapes (IE with backslash) or as literals with pipes around them as Racket’s built-in readtable does. At some point there will be options to set on the readtable itself to toggle options about those.
2.5.3 Extending and modifying chido-readtables
All readtable “modifiers” are functional. They do not actually mutate a readtable, they merely return a new modified readtable.
value
chido-readtable-symbol-effect/c : contract?
(or/c 'terminating 'soft-terminating 'nonterminating 'terminating-layout 'soft-terminating-layout 'nonterminating-layout 'left-recursive-nonterminating)
procedure
(extend-chido-readtable mode parser rt [ #:operator operator-type #:precedence-less-than precedence-less-than #:precedence-greater-than precedence-greater-than #:associativity associativity]) → chido-readtable? mode : chido-readtable-symbol-effect/c parser : parser/c rt : chido-readtable? operator-type : (or/c #f 'infix 'prefix 'postfix) = #f precedence-less-than : (or/c any/c (listof any/c)) = #f precedence-greater-than : (or/c any/c (listof any/c)) = #f associativity : (or/c #f 'left 'right) = #f
Also, chido-readtables support operators. You can use the optional arguments to do so, but better yet see chido-readtable-add-mixfix-operator.
procedure
(extend-chido-readtable* rt arg ...) → chido-readtable?
rt : chido-readtable? arg : any/c
(extend-chido-readtable* my-rt 'terminating at-reader-parser 'nonterminating some-other-parser)
procedure
(chido-readtable-add-list-parser l-delim r-delim rt [ #:wrapper wrapper #:inside-readtable inside-readtable #:readtable-symbol-effect readtable-symbol-effect]) → chido-readtable? l-delim : string? r-delim : string? rt : chid-readtable? wrapper : (or/c #f symbol? (-> syntax? syntax?)) = #f
inside-readtable : (or/c #f chido-readtable? (-> chido-readtable?)) = #f
readtable-symbol-effect : chido-readtable-symbol-effect/c = 'terminating
(chido-readtable-add-list-parser "(" ")" my-rt)
If inside-readtable is #f, the current-readtable is queried for recursive parses (of both forms and layout). You can instead set inside-readtable to a particular readtable to use it, or to a thunk that provides a readtable, in which case the thunk will be called (with no caching) to get an inner readtable each time.
(chido-readtable-add-list-parser "$(" ")" mostly-normal-rt #:wrapper '#%dollar-paren)
You can also use a procedure for wrapper, in which case the procedure takes the result syntax as an argument and must return a syntax object.
Note that if l-delim and r-delim are the same character, you can’t nest lists. It’s a constant annoyance to me that people ever choose to use the same (likely single character) string as both left and right delimiters for things, because of course you can’t nest such things without some kind of escaping.
The readtable-symbol-effect should generally be 'terminating, but another reasonable choice is 'terminating-layout to get an s-expression comment form. I’m not sure it’s a terribly useful comment form, but it’s interesting at least.
UNSTABLE - these list forms do not yet support dots for improper lists or for operator moving (a la '(op l r)). I will eventually add them as optional keyword arguments (without using the keyword, no dots will be supported).
procedure
(chido-readtable-add-raw-string-parser l-delim r-delim rt [ #:wrapper wrapper #:readtable-symbol-effect readtable-symbol-effect]) → chido-readtable? l-delim : string? r-delim : string? rt : chid-readtable? wrapper : (or/c #f symbol? (-> syntax? syntax?)) = #f
readtable-symbol-effect : chido-readtable-symbol-effect/c = 'terminating
(chido-readtable-add-raw-string-parser "«" "»" my-rt)
TODO - this example is messing up my auto-indentation, so I’m commenting it out for the moment.
I mean, that’s a pretty dumb example, but I think you get the picture. When you have to embed HTML inside Javascript inside PHP inside ... escaping is super annoying. But realistically raw strings of source code are useful in Racket in particular because you can write macros to parse those string at compile time. And sometimes you just want to write strings that contain quotes or backslashes without a bunch of escaping nonsense. In other words, they are also nice for writing regular expressions (IE a special case of embedding source code as a string).
Note that if l-delim and r-delim are the same character, you can’t nest raw strings. It’s a constant annoyance to me that people ever choose to use the same (likely single character) string as both left and right delimiters for things, because of course you can’t nest such things without some kind of escaping.
The readtable-symbol-effect should generally be 'terminating, but another reasonable choice is 'terminating-layout to get a nestable multi-line comment form.
procedure
(chido-readtable-add-mixfix-operator name rt [ #:layout layout #:precedence-greater-than precedence-greater-than #:precedence-less-than precedence-less-than #:associativity associativity]) → chido-readtable? name : (or/c symbol? string?) rt : chido-readtable? layout : (or/c 'required 'optional 'none) = 'required precedence-greater-than : list? = '() precedence-less-than : list? = '() associativity : (or/c #f 'left 'right) = #f
(chido-readtable-add-mixfix-operator "_<+>_" my-rt)
The name must have an underscore everywhere you want a recursive parse to happen, thus determining whether the operator is infix, prefix, postfix, or “notfix” (meaning that it neither begins nor ends with a recursive parse). The fixity determines the implicit symbol that starts the result form, either '#%chido-readtable-infix-operator, '#%chido-readtable-prefix-operator, '#%chido-readtable-postfix-operator, or '#%chido-readtable-notfix-operator.
The second symbol in the result is the operator name, with any leading and trailing underscores stripped but any interior underscores preserved.
(chido-readtable-add-mixfix-operator 'IF_THEN_ELSE_ my-rt)
The layout argument determines whether layout is required, optional, or disallowed between the operator components and the recursive parses. Obviously the only good answer here is 'required, but if you love poor language design where you can’t allow identifiers to include the characters used as operator names you can choose 'optional, just know that I think poorly of your choices. To round things out, you can also choose 'none, which I hope everyone can see is just a dumb choice. But I added it for the sake of completeness.
The associativity argument should not need much explanation. But it should be extended to allow associativity groups, which it currently doesn’t support.
The precedence-greater-than and precedence-less-than allow you to specify relative precedence of operators. Using these, your operator precedence forms a partial order, and any operators with no ordering between them require disambiguation (IE it’s an error to put them together). (I could have left it ambiguous, but really who wants that?) I could have written this with numeric precedence instead, but I like partial-order precedence a lot better, so that’s what you get in this implementation. (I mean, it could be extended to have the option of doing either, or maybe both! But that’s more work than I currently want to bother with.)
The symbol pieces of the operator name are blacklisted from the symbol parser. So if you add the operator '<>=<_>>#>_<$+¢>_ (channeling Haskell), the symbols '<>=<, '>>#>, and '<$+¢> would be unreadable by the readtable. This prevents ambiguous parses where you have a parse with the operator as an operator and a parse where you just have the operator name as a symbol in a list of forms.
procedure
(set-chido-readtable-symbol-result-transformer crt transformer) → chido-readtable? crt : chido-readtable? transformer : (-> syntax? any/c)
The default transformer is the identify function.
procedure
(set-chido-readtable-symbol-support rt v) → chido-readtable?
rt : chido-readtable? v : any/c
procedure
(chido-readtable-symbol-support? rt) → any/c
rt : chido-readtable?
procedure
(chido-readtable-blacklist-symbols symbols rt) → chido-readtable? symbols : (listof (or/c symbol? string?)) rt : chido-readtable?
procedure
(chido-readtable-name rt) → (or/c #f string?)
rt : chido-readtable?
procedure
(set-chido-readtable-name rt name) → chido-readtable?
rt : chido-readtable? name : string?
TODO - other APIs or parsers?
2.6 BNF DSL
Frankly, the BNF DSL is a little rougher than the rest of the system. It’s more likely to see some minor changes than the rest of the system.
syntax
(define-bnf-arm arm-name top-level-optional ... arm-alt-spec ...)
top-level-optional = #:layout layout-flag | #:layout-parsers layout-parsers | #:ignore-arm-name? ignore-arm-name? | #:result/stx result/stx | #:result/bare result/bare layout-flag = (quoterequired) | (quoteoptional) | (quotenone) bnf-arm-alt-spec = binding-sequence-elem ... | arm-alt-optional ... arm-alt-optional = #:name name | #:associativity associativity | #:precedence-less-than plt | #:precedence-greater-than pgt | #:result/stx result/stx | #:result/bare result/bare
In reality, a parser is always inserted between, but it checks a flag stored on the readtable...
The layout-parsers argument should be a list of parsers, and if unspecified defaults to the list (quote (" " "\t" "\r" "\n")). Note that these are simply layout parsers on the resulting readtable, so this list can be extended later.
The result/stx and result/bare arguments act as in the various combinators, but both a global default for the arm and per-alt overrides can be supplied. If ignore-arm-name? is false, the name of the arm is prepended to the result list by the default result constructior.
Each alternative in the “arm” is essentially a binding-sequence, but with different whole-sequence options. In particular, if the leftmost or rightmost sequence members are arm-name, the alternate is determined to be an operator, and the associativity and precedence arguments are used.
TODO - example.
syntax
(readtable-extend-as-bnf-arm rt option ... bnf-arm-alt-spec ...)
option = #:arm-name arm-name | #:ignore-arm-name? ignore-arm-name? | #:result/stx result/stx | #:result/bare result/bare
TODO - example.
syntax
(define-bnf name global-options ... [arm-name bnf-arm-option ... bnf-arm-alt-spec ...] ...)
global-options = #:layout-parsers layout-parsers | #:layout global-layout-flag bnf-arm-option = #:main-arm main-arm | #:layout layout-flag-for-arm | #:ignore-arm-name? ignore-arm-name | #:result/stx result/stx | #:result/bare result/bare
The bnf-arm-alt-specs you can use in each arm are the same as in define-bnf-arm. The layout flag can be set globally but overridden per-arm, with values of 'required, 'optional, or 'none as per define-bnf-arm. The layout-parsers again default to '(" " "\t" "\r" "\n")?
Each arm is a chido-readtable, as built with define-bnf-arm, and they can be accessed with bnf-parser->arm-parser.
TODO - example.
procedure
(bnf-parser? x) → boolean?
x : any/c
procedure
p : bnf-parser?
procedure
(bnf-parser->arm-parser p arm-name) → chido-readtable?
p : bnf-parser? arm-name : symbol?
2.7 TODO
TODO - document these
extend-bnf – a form to extend BNFs with an API similar to define-bnf.
define-bnf/quick – a form to define BNFs like define-bnf, but using short symbols instead of verbose keyword arguments.
define-bnf/syntactic – a form that defines a BNF using custom concrete syntax that looks more like traditional BNF notation. It takes a string literal and parses it at macro expansion time.
define-bnf/syntactic/parsed – this actually should be private, I think, but I’m leaving this in the list for the moment...
#lang chido-parse/bnf-syntactic – like define-bnf/syntactic but as a #lang. It allows trailing s-expression definitions and provides the defined bnf as parser.
3 Code and License
The code is available on github.
This library is distributed under the MIT license and the Apache version 2.0 license, at your option.