6 Language and Parser API
#lang shrubbery | package: shrubbery-lib |
#lang shrubbery/text |
The shrubbery meta-language is similar to the s-exp meta-language. It expects a module name after #lang shrubbery to serve as the language of a Racket module form, while the body of the module after the #lang line is parsed as shrubbery notation.
Unlike s-exp, shrubbery also works without another language listed on the #lang line. In that case, running the module prints the S-expression form of the parsed shrubbery (see Parsed Representation). For example,
#lang shrubbery 1+2
prints '(multi (group 1 (op +) 2)). But if "demo.rkt" contains
"demo.rkt"
#lang racket/base (require (for-syntax racket/base syntax/parse)) (provide (rename-out [module-begin #%module-begin]) + - * /) (define-syntax (module-begin stx) (syntax-parse stx #:datum-literals (multi group op) [(_ (multi (group n1:number (op o) n2:number))) #'(#%module-begin (o 'n1 'n2))]))
then
#lang shrubbery "demo.rkt" 1+2
prints the result 3.
A same-line module language for shrubbery is determined by using parse-all in 'line mode. As long as the resulting shrubbery is not empty, it is parsed in the same way that rhombus parses module names for import.
The shrubbery/text meta-language is similar to shrubbery, but it parses the module in 'text mode. For example,
#lang shrubbery/text @(1+2)
prints '(brackets (group (parens (group 1 (op +) 2)))).
6.1 Parsing API
(require shrubbery/parse) | package: shrubbery-lib |
procedure
(parse-all in [ #:source source #:mode mode #:start-column start-column]) → (or/c eof-object? syntax?) in : input-port? source : any/c = (object-name in) mode : (or/c 'top 'interactive 'line 'text) = 'top start-column : exact-nonnegative-integer? = 0
The result syntax object has no scopes, but it has source-location information and raw text properties. See Source Locations and Raw-Text Properties for more information.
The default 'top mode reads in until an end-of-file, and it expects a sequence of groups that are indented consistently throughout (i.e., all starting at the same column). The result in 'top mode is always a multi representation. The 'text mode is similar, but it starts in “text” mode, as if the entire input is inside curly braces of an @ form (see At-Notation Using @). The result of 'text mode is always a brackets representation.
The 'interactive and 'line modes are similar to 'top. They are suitable for a read-eval-print loop or reading the continuation of a #lang shrubbery line, respectively. In both modes, reading stops when a newline is encountered, unless an opener remains to be closed or a : was encountered. If reading continues due to a :, then it stops when a blank line is found (where a line containing a comment does not count as blank). In 'line mode, the result may be empty, while 'interactive mode continues past a newline if the result would be empty.
The shrubbery parser directly determines line and column changes for the purposes of determining indentation, so it does not require in to have line counting enabled for that purpose. The start-column argument supplies a number of characters that should be considered before the first character of in for parsing. Source locations attached to the result syntax objects are based on positions reported by in or line and column counting as enabled for in.
6.2 Source Locations and Raw-Text Properties
(require shrubbery/property) | package: shrubbery-lib |
The result of parse-all records source locations in the syntax object representing a shrubbery form. Source-location information and other properties for a shrubbery (), [], {}, or '' is associated with the parens, brackets, braces, or quotes identifier in the representation. Similarly, source-location information and properties for a : block or | alternatives are recorded on a block or alts identifier. For all compound forms, including group and multi, the source location on the representation’s head identifier spans the compound’s form. For an operator, source-location information and properties are associated to the operator identifier, and not the wrapper op identifier. Each structuring identifier like group, block, parens, or op has the 'identifier-as-keyword syntax property as #t.
The result of parse-all has source locations copied to the S-expression “parentheses” of a syntax object representing a compound form, but only as a convenience; the intent is that information is more permanently and reliably associated to each compound form’s head identifier. Along similar lines, spanning source locations for compound forms tend not to be maintained as syntax objects are manipulated, and the intent is that spanning locations are reconstructed as needed, especially for group and multi forms. A useful convention may be to treat source locations the S-expression “parentheses” of a compound form as a cache for spanning information computed from the form’s content.
A syntax object produced by parse-all includes raw text properties that allow the original shrubbery text to be reconstructed from the syntax object. This raw text information is distributed among syntax objects in a way that is preserved to a useful degree on terms as they are rearranged by enforestation and macro expansion. The shrubbery-syntax->string function reconstructs source text based on raw text properties in a syntax object. The shrubbery/property module exports functions to access and update properties, which can be helpful to avoid typos that are all too easy when writing the key directly as a quoted symbol.
Raw-text property values are trees of strings: a string, an empty list, or a pair containing two trees. The source raw text is reconstructed through a preorder traversal of the tree. The parse-all function attaches raw text properties only to atom terms and the head identifiers of compound terms, not counting multi. A head op, multi, or parsed is normally not consulted for syntax properties, but parse-all associates empty raw text to a head op or multi. For a parsed representation, the third component of the parsed list is consulted for 'opaque-raw by functions like shrubbery-syntax->string.
The main raw text property key is 'raw, but the following list of all raw text keys is in the order that they contribute to reconstructed text:
'raw-prefix (use syntax-raw-prefix-property): records original text for whitespace and comments before a term or group. The parse-all function uses this property only when raw text cannot instead be associated with a preceding term or group as a suffix. When raw text must be associated as a prefix but could be attached to either a term or its enclosing group (because the term is first within its group), parse-all attaches the prefix to the group.
'raw-inner-prefix (use syntax-raw-inner-prefix-property): like 'raw-prefix, and after 'raw-prefix and still before 'raw, but with the intent that the raw text sticks to its term, instead of being shifted to a preceding term or enclosing group. The parse-all function uses this property only to record a @ that appears before a term for At-Notation Using @.
'raw (use syntax-raw-property): records the original text of an atomic term or the opening text for a compound term. For example, the input 0x11 will be parsed as the number 17 with a 'raw property value "0x11". The 'raw property for a parens representation is normally "(", while 'raw for a group representation is normally empty. The parse-all function also associated an empty 'raw on the op identifier of an operator representation, although that identifier is not normally consulted for properties (e.g., by shrubbery-syntax->string).
'raw-opaque-content (use syntax-raw-opaque-content-property): records raw text to use in place of the content of compound form. When the compound form also has the 'raw property, 'raw-opaque-content is shown after 'raw. A non-compound form can have 'raw-opaque-content, and it is combined with 'raw in that case, too. Note that the 'opaque-raw is a different property.
'opaque-raw (use syntax-opaque-raw-property): raw text that supersedes 'raw and 'raw-opaque-content. Furthermore, this property is recognized when present on the S-expression “parentheses” of a compound form or an op form, in which case properties on the leading identifier and/or compound form’s content are ignored. Although this property is intended for use on S-expression “parentheses” in a shrubbery representation, it is also recognized on atom terms. When 'opaque-raw is recognized on S-expression “parentheses”, then 'raw-prefix, 'raw-inner-prefix, 'raw-inner-suffix, and 'raw-suffix are also recognized (and those would otherwise be ignored).
'raw-tail (use syntax-raw-tail-property): records original text to appear after a term’s content. This property is intended for use with compound terms to hold the compound form’s closer. For example, 'raw-tail property for a parens representation is normally ")".
'raw-inner-suffix (use syntax-raw-inner-suffix-property): analogous to 'raw-inner-prefix, but for a suffix that should stick with its term. The parse-all function currently does not use this property.
'raw-suffix (use syntax-raw-suffix-property): records original text for whitespace and comments after a term or group. When a term is last in its group, the parse-all function uses 'raw-suffix on the group instead of the term.
Each of these properties is normally preserved (in the sense of a true fourth argument to syntax-property), except for 'opaque-raw, which is intended for use in intermediate, short-term mixtures of shrubbery forms and S-expressions.
procedure
(syntax-raw-property stx) → any/c
stx : syntax? (syntax-raw-property stx val) → syntax? stx : syntax? val : any/c
For example, parse-all will parse the input 0x11 as the number 17 with a 'raw property value "0x11". The input "\u3BB" will be parsed as the string "λ" with a 'raw property value "\"\\u3BB\"".
procedure
(syntax-raw-prefix-property stx) → any/c
stx : syntax? (syntax-raw-prefix-property stx val) → syntax? stx : syntax? val : any/c
procedure
(syntax-raw-suffix-property stx) → any/c
stx : syntax? (syntax-raw-suffix-property stx val) → syntax? stx : syntax? val : any/c
For example, parse-all will parse the input 1 + 2 // done into the S-expression representation (multi (group 1 (op +) 2)). The syntax object for group will have a 'raw-prefix value equivalent to " " and a 'raw-suffix value equivalent to " // done", but possibly within a tree structure instead of a single string. The syntax object for 1 and will have a 'raw-suffix value equivalent to " ", while the syntax object for + and will have a 'raw-suffix value equivalent to " " (i.e., two spaces).
procedure
stx : syntax? (syntax-raw-inner-prefix-property stx val) → syntax? stx : syntax? val : any/c
procedure
stx : syntax? (syntax-raw-inner-suffix-property stx val) → syntax? stx : syntax? val : any/c
The parse-all function will parse the input @x into an S-expression representation x with a 'raw-inner-prefix property "@". The parse-all function never produces a syntax object with 'raw-inner-suffix.
procedure
(syntax-raw-tail-property stx) → any/c
stx : syntax? (syntax-raw-tail-property stx val) → syntax? stx : syntax? val : any/c
For example, the input (1 + 2) * 4 //done will be parsed into the S-expression representation (multi (group (parens (group 1 (op +) 2)) (op *) 4)). The syntax object for the outer group will have a 'raw-suffix value equivalent to " // done". The syntax object for parens will have a 'raw value equivalent to "(", a 'raw-tail value equivalent to ")", and a 'raw-suffix value equivalent to " ". The inner group syntax object will have no properties or ones with values that are equivalent to empty strings.
procedure
stx : syntax?
(syntax-raw-opaque-content-property stx val) → syntax? stx : syntax? val : any/c
procedure
(syntax-opaque-raw-property stx) → any/c
stx : syntax? (syntax-opaque-raw-property stx val) → syntax? stx : syntax? val : any/c
The 'opaque-raw property is useful in macro expansion to
record a macro’s input to its output—
6.3 Writing Shrubbery Notation
(require shrubbery/write) | package: shrubbery-lib |
procedure
(write-shrubbery v [ port #:pretty? pretty? #:width width #:armor? armor? #:prefer-multiline? prefer-multiline?]) → void? v : any/c port : output-port? = (current-output-port) pretty? : any/c = #f width : (or/c exact-nonnegative-integer?) = #f armor? : any/c = #f prefer-multiline? : any/c = #f
The default mode with pretty? as #false prints in a simple and relatively fast way (compared to pretty-shrubbery). When pretty? is a true value, single-line output is preferred if width is #false, otherwise it is preferred only when the line fits with width columns. Use pretty-shrubbery to gain more control over line choices when printing.
If pretty? is #false or armor? is a true value, then the printed form is line- and column-insensitive. If pretty? is a true value, armor? is #f, and prefer-multiline? is a true value, then line breaks are used instead of « and ».
Note that write-shrubbery expects an S-expression, not a syntax object, so it cannot use raw text properties. See also shrubbery-syntax->string.
procedure
(pretty-shrubbery v [ #:armor? armor? #:prefer-multiline? prefer-multiline?]) → any/c v : any/c armor? : any/c = #f prefer-multiline? : any/c = #f
The description is an S-expression DAG (directed acyclic graph) that represents pretty-printing instructions and alternatives:
string or bytes: print literally.
'nl: print a newline followed by spaces corresponding to the current indentation.
`(seq ,doc ...): print each doc in sequence, each with the same indentation.
`(nest ,n ,doc): print doc with the current indentation increased by n.
`(align ,doc): print doc with the current indentation set to the current output column.
`(or ,doc ,doc): print either doc; always taking the first doc in an 'or will produce single-line output if prefer-multiline? is #f, while always taking the second doc will print a maximal number of lines.
The description can be a DAG because 'or alternatives might have components in common. In the worst case, a tree view of the instructions can be exponentially larger than the DAG representation.
6.4 Reconstructing Shrubbery Notation
(require shrubbery/print) | package: shrubbery-lib |
procedure
(shrubbery-syntax->string s [ #:use-raw? use-raw? #:max-length max-length #:keep-prefix? keep-prefix? #:keep-suffix? keep-suffix? #:inner? inner? #:infer-starting-indentation? infer-starting-indentation? #:register-stx-range register-stx-range #:render-stx-hook render-stx-hook]) → string? s : syntax? use-raw? : any/c = #f max-length : (or/c #f exact-positive-integer?) = #f keep-prefix? : any/c = #f keep-suffix? : any/c = #f inner? : any/c = #f infer-starting-indentation? : any/c = (not keep-prefix?)
register-stx-range :
(syntax? exact-nonnegative-integer? exact-nonnegative-integer? . -> . any) = void
render-stx-hook : (syntax? output-port? . -> . any/c) = (lambda (stx output) #f)
If max-length is a number, the returned string will contain no more than max-length characters. Internally, conversion to a string can take shortcuts once the first max-length characters have been determined.
When keep-suffix? are keep-suffix? are true and raw text mode is used to generate the result string, then 'raw-prefix and 'raw-suffix text on the immediate syntax object are included in the result. Otherwise, prefixes and suffixes are rendered only when they appear between 'raw text. If inner? is true, “inner” prefixes and suffixes are preserved on the immediate s form even if keep-suffix? and/or keep-suffix? are #false. If s is a group or multi-group form, then inner prefixes and suffixes are preserved in any case.
If infer-starting-indentation? is true, then a consistent amount of leading whitespace is removed from each line of the result string.
The register-stx-range and render-stx-hook arguments provide a hook to record or replace rendering of a syntax object within s. The register-stx-range procedure is called with each syntax object in s after printing, and the second and third arguments report the starting and ending locations in the string for the syntax object’s printed form. The render-stx-hook procedure is called before printing each syntax object, and if it returns a true value, then printing assumes that the syntax object has alerady been rendered to the given output port (which is ultimately delivered to a string), and it is not printed in the default way.
Raw text is consistently available when supplied by 'raw syntax properties on all atoms, except that 'raw-opaque-content and/or 'opaque-raw properties excuse nested atoms from needing 'raw properties. Also, a parsed form need not have raw text information.
procedure
(shrubbery-syntax->raw s [ #:use-raw? use-raw? #:keep-prefix? keep-prefix? #:keep-suffix? keep-suffix? #:inner? inner?])
→
any? any? any? s : syntax? use-raw? : any/c = #f keep-prefix? : any/c = #f keep-suffix? : any/c = #f inner? : any/c = #f
procedure
(combine-shrubbery-raw a b) → any/c
a : any/c b : any/c