On this page:
7.1 Rationale
7.2 Prior Art
8.13.0.1

7 Design Considerations🔗ℹ

Shrubbery notation serves the same role as S-expression notation as a vehicle for a programming language, but with different trade-offs.

S-expression notation imposes a grouping at the token level that is all but guaranteed to be respected by further parsing via macro expansion. One consequence of this token-based grouping is that programs can be pretty-printed and textually traversed in standard ways.

A traditional use of S-expression notation, however, insists that all grouping is reflected in the S-expression. Reifying all grouping at the token level is so onerous that many practical deployments of S-expressions include deviations from the rule, such as keyword-based arguments or implicit grouping by position (as in various Clojure forms).

Another disadvantage of S-expressions is that many of the parentheses are redundant after the expression is pretty-printed, because indentation provides the same grouping information in a more human-readable way. That observation suggests instead relying on line breaks and indentation to impart grouping information, as in Python.

Shrubbery notation explores a point in the design space where the notation is

Deferring complete grouping to another parser relieves a burden on reader-level notation. At the same time, line- and indentation-sensitive rules constrain parsing to ensure that line breaks and indentation in the source are not misleading.

7.1 Rationale🔗ℹ

The token-level syntax is chosen to be familiar to programmers generally. The sequence 1+2 is one plus two, not a strangely spelled identifier. Tokens like (, ,, { and ; are used in familiar ways. Shrubbery notation provides enough grouping structure that code navigation and transformation should be useful and straightforward in an editor.

Parentheses in shrubbery notation do not disable indentation, unlike some indentation-sensitive notations. That choice supports a language in shrubbery notation where parentheses can be added around any expression — even if the expression is written with indentation (although the expression may need to be shifted right to preserve relative indentation, depending on how parentheses are added).

The inclusion of ' as a parenthesis-like form reflects how shrubbery notation is intended as a vehicle for metaprogramming contexts where quoting program terms is common. Using the same character as the opener and closer creates some hassle for certain nesting cases, but those are relatively rare. Meanwhile, the use of ' to mean a form of quoting should be familiar to programmers, although there is a risk that 's will be misunderstood as string quoting.

The inclusion of | in shrubbery notation reflects the fact that conditional forms (such a if, cond, and match) are important and common. A distinct, pleasant, and uniform pattern for conditionals deserves direct support in the notation.

Requiring a preceding : or preceding/following | for block-creating indentation is mostly a kind of consistency check to enable better and earlier errors when indentation goes wrong. It also allows indentation that starts with an operator to continue a group; it’s possible for bad indentation to inadvertently cause an operator to be treated as continuing a group, but hopefully that will be rare. Always requiring a preceding : before an indented | line would be consistent, but it adds extras :s where | already provides one consistency check.

A : block or | alternatives must appear at the end of a group, because : and | lack a specific closing character. Forms that expect both a : block and | alternatives in a group seem unlikely to be common, but allowing both supports certain syntactic forms (e.g., an operator that is defined through multiple pattern-matching cases, but with a block before all cases to specify precedence and associativity). Allowing the combination, in turn, naturally leads to allowing an empty-block : before a sequence of | alternatives, effectively making a : before | optional. Normalizing to drop the empty block in that case, instead of preserving it before the sequence of alternatives, makes shrubbery notation feel more consistent without burdening consumers of shrubbery forms to explicitly accomodate an empty block.

Explicit block grouping via « and » is expected to be rare. The grouping characters were intentionally chosen from the Latin-1 extension of ASCII to avoid reserving additional ASCII characters.

Making whitespace and comment lines ignored in all contexts means that they can be freely added without intefering with grouping. The \ continuation operator is somewhat unusual in that it skips blank and comment lines to continue, as opposed to requiring \ on every continuing line; that, too, allows extra blank and comment lines to be added, even amid continuing lines.

The interaction of indentation and \ differs slightly from Python, which does not count the space for \ itself or any leading whitespace on a continuing line toward indentation. Counting the leading whitespace on a continuing line has the advantage that it can reach an arbitrary amount of identation within a constrained textual width. Counting the \ itself is consistent with ignoring \ when it appears within a line, so grouping stays the same whether there’s a newline or the continue line immediately after \. The whitespace role of \ also means that spaces can be turned into \ to “harden” code for transfer via media (such as email) that might mangle consecutive spaces.

Using ~ for keywords has a precedent in OCaml. Reserving ~ for keywords exclusively would use up a character that might otherwise be used for operators, and so ~ is also allowed as in operator as long as it is combined with other operator characters. The notion of keywords as distinct from identifiers has been liberating for Racket syntax (particularly since keywords can be kept disintinct from expressions more generally), and we expect similar benefits for having keywords in shrubbery notation.

The #{} escape to S-expressions provides a bridge between shrubbery notation and Racket identifiers. For example, #{exact-integer?} is an identifier with - and ? as part of the identifier. Shrubbery notation could be adapted to support Lisp-style identifiers by requiring more space around operators, but the rule for continuing a group between ( and ), [ and ], or { and } currently depends on distinguishing operators from non-operators.

For @, the choice of treating @f(arg){text} as f(arg, ["text"]) instead of f(arg, "text") reflects experience with S-expression @ notation. Although it seems convenient that, say, @bold{x} is treated as (bold "x"), the consequence is that a function like bold might be implemented at first to take a single argument; later, a use like @bold{Hello @name} breaks, because two arguments are provided. Making explicit the list that’s inherent in body parsing should help reduce such mistakes (or bad design choices) for functions that are meant to be used with @ notation.

7.2 Prior Art🔗ℹ

Indentation-sensitive parsing and the use of : is obviously informed by Python.

Shrubbery notation’s rules relating indentation, lines, ;, and : are originally based on the #lang something reader, which also targets an underlying expander that further groups tokens. Shrubbery notation evolved away from using {} for blocks, however, because : was nearly always preferred in experiments with the notation. For the very rare case that explicit grouping is needed for a block, «» can be used. Freeing {} from use for blocks, meanwhile, allows its use for set and map notations.

Shrubbery notation is also based on Lexprs, particularly its use of |. Lexprs use mandatory : and | tokens as a prefix for indentation, and it absorbs an additional line after an indented section to allow further chaining of the group. Although «» can be used to form multiple subgroups within a shrubbery group, the notation discourages that style in favor of further nesting (or, in the case of if, in favor of | notation like other conditionals).

Shrubbery notation is in some sense a follow-up to sapling notation. The primary difference is that shrubbery notation is indentation-sensitive, while sapling notation is indentation-insensitive. Indentation sensitivity and block conventions in shrubbery notation avoid some delimiters and blank lines that are needed in sapling notation.

More generally, shrubbery notation takes inspiration from S-expressions and alternative S-expression notations. The idea that, even in an S-expression-like setting, some parsing can be deferred to a later parser has many precedents, including Clojure’s choice of where to put parentheses and notations that use something like $ to escape to infix mode.