Parsec implementation in Racket
(require parsack) | package: parsack-lib |
Parsec implementation in Racket. See [parsec].
1 Parsers
A parser is a function that consumes an input-port? and returns a parse result. A parse result is most commonly a char? or a list of chars, but is ultimately determined by each specific parser.
whether a parser consumes input, i.e., whether data was read from the input port,
or whether a parser succeeds or fails.
Specifically, Consumed and Empty struct results indicate input-consumption and no-input-consumption, respectively, while the Ok and Error structs indicate success and failure, respectively. See Parse Result Structs for more details about information contained in the parse result. In general, users should use combinators below to compose parsers and parse results, rather than directly handle the structs.
2 Basic parsing functions
procedure
(parse-result p input) → any/c
p : parser? input : (or/c string? path? input-port?)
The input can be either a string, a filepath, or an input port. A string or path input is converted to an input port before parsing begins.
Raises the exn:fail:parsack exception on parser failure.
> (parse-result $letter "abc") #\a
> (parse-result $letter "123") parse ERROR: at 1:1:1
unexpected: "1"
expected: "letter"
> (parse-result (many $letter) "abc123") '(#\a #\b #\c)
> (parse-result (many $letter) "123") '()
procedure
p : parser? input : (or/c string? path? input-port?)
The input can be either a string, a filepath, or an input port. A string or path input is converted to an input port before parsing begins.
Raises the exn:fail:parsack exception on parser failure.
> (parse $letter "abc") (Consumed (Ok #\a))
> (parse $letter "123") parse ERROR: at 1:1:1
unexpected: "1"
expected: "letter"
> (parse (many $letter) "abc123") (Consumed (Ok '(#\a #\b #\c)))
> (parse (many $letter) "123") (Empty (Ok '()))
3 Basic parsing forms
If p fails, the error result is returned.
Otherwise, the parse result is passed to f, and continues parsing with the parser created from applying f.
> (parse-result (>>= (char #\() (λ (skip1) (>>= $letter (λ (x) (>>= (char #\)) (λ (skip2) (return x))))))) "(a)") #\a
procedure
(>> p q) → parser?
p : parser? q : parser?
Creates a parser that first parses with p, and if successful, ignores the result, then parses with q.
syntax
(parser-compose bind-or-skip ...+)
bind-or-skip = (x <- parser) | parser parser = parser? x = identifier?
> (parse-result (parser-compose (char #\[) (x <- $letter) (y <- $letter) (char #\]) (return (list x y))) "[ab]") '(#\a #\b)
syntax
(parser-seq skippable ... maybe-combine)
skippable = (~ parser) | parser parser = parser? maybe-combine =
| #:combine-with combine combine = (-> any/c any/c)
The default combine function is list so (parser-seq p q) is syntactically equivalent to (parser-compose (x <- p) (y <- q) (return (list x y))).
Use parser-seq instead of parser-compose when you don’t need the result of any parser other than to return it. Use parser-compose when you need the result of one parse to build subsequent parsers, for example when parsing matching html open-close tags, or if you need to do additional processing on the parse result before returning.
> (parse-result (parser-seq (~ (char #\[)) $letter $letter (~ (char #\]))) "[ab]") '(#\a #\b)
syntax
(parser-cons p q)
p = parser? q = parser?
> (parse-result (parser-cons $letter (many $letter)) "abcde") '(#\a #\b #\c #\d #\e)
syntax
(parser-one p ...)
p = (~> parser) | parser parser = parser?
For example, (parser-one p1 (~> p2) p3) is syntactically equivalent to (parser-seq (~ p1) p2 (~ p3) #:combine-with (λ (x) x)), which is equivalent to (parser-compose p1 (x <- p2) p3 (return x)).
> (parse-result (parser-one (char #\() (~> $letter) (char #\))) "(a)") #\a
4 Basic Combinators
procedure
(<or> p q ...) → parser?
p : parser? q : parser?
> (parse-result (<or> $letter $digit) "1") #\1
<or> continues to try subsequent parsers so long as each of the previous parsers consumes no input, even if one of the previous parsers returns successfully. Thus <or> implements "longest match" (see [parsec] for more details).
Example:> (parse-result (<or> (return null) $digit) "1") #\1
See also <any>, a related parser that immediately returns when it encounters a successful parse, even if the parse consumed no input.
Example:> (parse-result (<any> (return null) $digit) "1") '()
But if no parsers consume input, then <or> backtracks to return the result of the first success.
Example:> (parse-result (<or> (return "a") (return "b") (return "c")) "1") "a"
- If one of the given parsers consumes input, and then errors, <or> returns immediately with the error.Example:
> (parse-result (<or> (string "ab") (string "ac")) "ac") parse ERROR: at 1:2:2
unexpected: "c"
expected: "b"
Use try to reset the input on a partial parse.Example:> (parse-result (<or> (try (string "ab")) (string "ac")) "ac") '(#\a #\c)
procedure
(<any> p q ...) → parser?
p : parser? q : parser?
> (parse-result (<any> $letter $digit) "1") #\1
<any> immediately returns when it encounters a successful parse, even if the parse consumed no input.
Example:> (parse-result (<any> (return null) $digit) "1") '()
See also <or>, a related parser that continues to try subsequent parsers so long as each of the previous parsers consumes no input, even if one of the previous parsers returns successfully.
Example:> (parse-result (<or> (return null) $digit) "1") #\1
procedure
(many p [#:till end #:or orcomb]) → parser?
p : parser? end : parser? = (return null) orcomb : (-> parser? ... (or/c Consumed? Empty?)) = <or>
procedure
(many1 p) → parser?
p : parser?
procedure
(manyTill p end [#:or orcomb]) → parser?
p : parser? end : parser? orcomb : (-> parser? ... (or/c Consumed? Empty?)) = <or>
procedure
(many1Till p end [#:or orcomb]) → parser?
p : parser? end : parser? orcomb : (-> parser? ... (or/c Consumed? Empty?)) = <or>
procedure
(manyUntil p end) → parser?
p : parser? end : parser?
procedure
(many1Until p end) → parser?
p : parser? end : parser?
Equivalent to (many1Till p end #:or <any>).
4.1 A Basic CSV parser
Here is an implementation of a basic parser for comma-separated values (CSV).
A series of cells are separated by commas. To parse cells, we use two mutually referential parsers.
> (define $cells (parser-cons $oneCell $remainingCells))
> (define $remainingCells (<or> (>> (char #\,) $cells) (return null)))
> (define $line (parser-one (~> $cells) $eol))
> (define $csv (many $line)) > (parse-result $csv "cell1,cell2\ncell3,cell4\n")
'(((#\c #\e #\l #\l #\1) (#\c #\e #\l #\l #\2))
((#\c #\e #\l #\l #\3) (#\c #\e #\l #\l #\4)))
5 Other combinators
procedure
(skipMany p) → parser?
p : parser?
procedure
(skipMany1 p) → parser?
p : parser?
procedure
(sepBy p sep) → parser?
p : parser? sep : parser?
procedure
(sepBy1 p sep) → parser?
p : parser? sep : parser?
procedure
(endBy p end) → parser?
p : parser? end : parser?
procedure
(between open close p) → parser?
open : parser? close : parser? p : parser?
procedure
(lookAhead p) → parser?
p : parser?
procedure
(notFollowedBy p) → parser?
p : parser?
6 Character parsing
procedure
(charAnyCase c) → parser?
c : char?
procedure
(oneOfStrings str ...) → parser?
str : string?
procedure
(oneOfStringsAnyCase str ...) → parser?
str : string?
7 Constant parsers
value
$letter : parser?
value
$digit : parser?
value
$alphaNum : parser?
value
$hexDigit : parser?
value
$space : parser?
value
$spaces : parser?
value
$anyChar : parser?
value
$newline : parser?
value
$tab : parser?
value
$eol : parser?
value
$eof : parser?
value
$identifier : parser?
8 User State
syntax
(withState ([key value] ...) parser)
9 Error handling combinators
procedure
(try p) → parser?
p : parser?
> ((string "ab") (open-input-string "ac")) (Consumed #<Error>)
> ((try (string "ab")) (open-input-string "ac")) (Empty #<Error>)
value
$err : parser?
10 Bytestring parsing
procedure
(bytestring bstr) → parser?
bstr : bytes?
11 Parse Result Structs
A parser is a function that consumes an input-port? and returns either a Consumed, or an Empty struct.
In general, users should use the above combinators to connect parsers and parse results, rather than manipulate these structs directly.
> (parse $letter "abc") (Consumed (Ok #\a))
> (parse (many $letter) "abc123") (Consumed (Ok '(#\a #\b #\c)))
> ((string "ab") (open-input-string "ac")) (Consumed #<Error>)
The parse result can be any value and depends on the specific parser that produces the this struct.
> ($letter (open-input-string "123")) (Empty #<Error>)
struct
Bibliography
[parsec] | Daan Leijen and Erik Meijer, “Parsec: A practical parser library,” Electronic Notes in Theoretical Computer Science, 2001. |