peggen
()
(require peg-gen) | package: peg-gen |
1 Overview
This module defines a collection of PEG rackcheck generators. To generate well-formed PEGs, each term is always generated as a tern (peg, nullable, headset). This approach allows for incremental type correct construction of the term.
When generating a grammar, the context of all possible variables (Γ) is kept and updated accordingly. An additional context called Δ is used to avoid generated calls to non-terminals that would result in left recursion.
The general use case for this library is the function gen:peg whose arguments are the maximum number of variables, the maximum number of terminal symbols, and the maximum depth of a peg.
> (sample (gen:peg 3 2 1) 3)
(list
'(∅ 0 ())
'(∅ 0 ())
(list
'(R (• 2 ϵ) (S (• R F) (F (/ 2 ϵ) ∅)))
'(/ 0 0)
(list
(cons 'F (TyPEG #t '()))
(cons 'S (TyPEG #f '(R)))
(cons 'R (TyPEG #f '())))))
G (Grammar) : A list of variables followe by a peg. The list ends with ∅ (empty grammar)
e: A peg expression
Context : The type context for gamma
2 Generators reference
procedure
(gen:Γ maxVars [#:varSize varSize]) → gen?
maxVars : exact-positive-integer? varSize : exact-positive-integer? = 0
procedure
(gen:peg maxVars maxLits maxDepth) → gen?
maxVars : exact-positive-integer? maxLits : exact-positive-integer? maxDepth : exact-positive-integer?
maxVars : The maximum number of variables to be present in the grammamr
maxVars : The maximum number of literals to be present in the grammamr
maxDepth : The maximum height of a PEG AST (all leaves have high 0)
procedure
(gen:peg-s maxVars maxLits maxDepth b) → gen?
maxVars : exact-positive-integer? maxLits : exact-positive-integer? maxDepth : exact-positive-integer? b : boolean?
maxVars : The maximum number of variables to be present in the grammamr
maxVars : The maximum number of literals to be present in the grammamr
maxDepth : The maximum height of a PEG AST (all leaves have high 0)
b: The start expression must be nullable ?
procedure
(gen:grm G Γ Δ Σ n pmax) → gen?
G : (list/c symbol? peg G) Γ : (list/c symbol? boolean? (listof symbol?)) Δ : (hash/c symbol? (listof symbol?)) Σ : (listof natural?) n : exact-positive-integer? pmax : exact-positive-integer?
procedure
(gen:expr Γ Δ Σ b p) → gen?
Γ : (list/c symbol? boolean? (listof symbol?)) Δ : (listof symbol?) Σ : (listof natural?) b : boolean? p : exact-positive-integer?
Γ: is a list of terns: v is a variable name; nullable is a value that determines whether or not that variable is nullable; italic{headset} is a list of possible variables starting the body of v.
Δ: is a list of constraints - forbidden variables
Σ: is an alphabet
b: Should be #t whenever the generated expression must be nullable.
p: Depth of the expression. If p is 0, only generate terminals or allowed variables. Samples n values from g.
procedure
(gen:var varSize) → gen?
varSize : exact-positive-integer?
procedure
(gen:symbolVar varSize) → gen?
varSize : exact-positive-integer?
3 Changing the output structure of the generator
In some cases, it may be helpful to have the generator produce an output format more suitable for other applications, such as creating a proper source code or testing a specific library.
The generator uses a structure (referred to as a factory) that contains the constructors for all the parsing expressions and the grammar, and this structure can be set with setSynFactory. The module peg-gen-syntax contains an alternative structure called PEGStructF.
> (require peg-gen peg-gen/peg-gen-syntax) > (setSynFactory PEGStructF) > (sample (gen:peg 3 2 1) 3)
(list
(GPEG '#hash() (GLit 0) '())
(GPEG '#hash() (GLit 0) '())
(GPEG
(hash
'F
(GAlt (GLit 2) (GEps))
'R
(GSeq (GLit 2) (GEps))
'S
(GSeq (GVar 'R) (GVar 'R)))
(GAlt (GLit 0) (GLit 0))
(list
(cons 'F (TyPEG #t '()))
(cons 'S (TyPEG #f '(R)))
(cons 'R (TyPEG #f '())))))
(struct PEGFSyn ( mkEps mkLit mkVar mkNot mkKle mkSeq mkAlt addRule mkEmptyGrm mkPEG))
mkEps : A function with no parameter that builds an empty PEG symbol.
mkLit : A function that takes one parameter (from Σ) and builds a terminal symbol.
mkVar : A function that takes one parameter (a racket symbol from the set of non-terminals) and builds a no-terminal (or a variable) symbol.
mkSeq and mkAlt : Takes two parameters and builds a sequence and an alternative constructions respectively.
mkNot and mkKle : Takes one parameter and builds a not predicate and a repetition construction , respectively.
addRule : Takes three parameters: A representation of the Grammar (G), a representation of non-terminal name (NT) and a parsing expression (E) and produces a new grammar adding the rule NT <- E to it.
mkEmptyGrm : Takes no parameters and returns a representation of the empty grammar.
mKPEG : Takes a representation of grammar, one parsing expression (as the start parsing expression) and a type context (as a list of pairs) and builds a structure for the PEG.
As an example consider the follwing definition for the default factory used by peg-gen:
(define defaultFactory (PEGFSyn (lambda () 'ϵ) (lambda (x) x) (lambda (x) x) (lambda (e) (list '! e)) (lambda (e) (list '* e)) (lambda (x y) (list '• x y)) (lambda (x y) (list '/ x y)) (lambda (g nt e) (list nt e g)) (lambda () '∅) (lambda (g start gamma) (list g start gamma))))
Note that in this syntax, we represent a PEG as a recursive tuple of a Non-terminal, its right-hand side followed by the rest of the grammar. The symbol ∅ represents the empty grammar. Notice that the constructors only expect to build PEGs without types. The generation algorithm produces the types. The type context is given to the constructor mkPEG at the end of the generation process.
(struct GPEG (nt start gamma) #:transparent) (struct GEps () #:transparent) (struct GLit (chr) #:transparent) (struct GVar (var) #:transparent) (struct GAlt (left right) #:transparent) (struct GSeq (left right) #:transparent) (struct GNot (exp) #:transparent) (struct GKle (exp) #:transparent)
(define PEGStructF (PEGFSyn (lambda () (GEps)) (lambda (x) (GLit x)) (lambda (x) (GVar x)) (lambda (e) (GNot e)) (lambda (e) (GKle e)) (lambda (x y) (GSeq x y)) (lambda (x y) (GAlt x y)) (lambda (g nt e) (hash-set g nt e)) (lambda () (make-immutable-hash)) (lambda (g start gamma) (GPEG g start gamma))))
The module that defines this data type also define some pretty print functions:
4 Generating Ill-typed PEGs
As an experimental feature, we add the two new generators that generate PEGs that are ill-typed (not well-formed according to Ford’s definition).
procedure
(gen:ill-peg maxVars maxLits maxDepth) → gen?
maxVars : exact-positive-integer? maxLits : exact-positive-integer? maxDepth : exact-positive-integer?
procedure
(gen:ill-peg-s maxVars maxLits maxDepth b) → gen?
maxVars : exact-positive-integer? maxLits : exact-positive-integer? maxDepth : exact-positive-integer? b : boolean?