2.1 Groups and Blocks

8.18.0.10

2.1 Groups and Blocks🔗ℹ

The heart of shrubbery notation is its set of rules for organizing terms into groups. Terms include atoms, which are a subset of individual tokens for things like numbers, identifiers, strings, and booleans. Parentheses and similar opener–closer pairs form compound terms. A block created with : is also a term, and a non-empty sequence of | alternatives is a term. The following grammar summarizes the abstract structure of a group, ignoring whitespace rules and , and ; separators.

‹group›
::=
‹term›+

‹term›
::=
‹atom›

  |
( ‹group›* )  |  [ ‹group›* ]  |  { ‹group›* }  |  ' ‹group›* '

  |
: ‹group›*

  |
( | ‹group›* )+

This initial grammar is overly permissive, however, because a sequence of | alternatives can appear only at the end of a group, and a : block can appear in a group at most once and only after all terms other than a sequence of | alternatives. Any number of self-delimiting terms can appear before a single optional : block or sequence of | alternatives, as long as the group is nonempty.

‹group›
::=
‹delit›* ‹block›? ‹alts›?    — must be nonempty

‹delit›
::=
‹atom›

  |
( ‹group›* )  |  [ ‹group›* ]  |  { ‹group›* }  |  ' ‹group›* '

‹block›
::=
: ‹group›*

‹alts›
::=
( | ‹group›* )+

Overall, a document is a sequence of groups:

‹document›
::=
‹group›*

2.1.1 Grouping by Lines🔗ℹ

The main grouping rule is that sequences on different lines with the same indentation create separate groups, one for each line.

this is the first group
this is the second group

Comments and lines with only whitespace are ignored. They don’t count when this document says “the previous line” or “the next line.”

2.1.2 Grouping by Opener–Closer Pairs🔗ℹ

An opener–closer pair (…), […], {…}, or '…' forms a term that can span lines and encloses nested groups. Within most opener–closer pairs, , separates groups, but ; separates group with '…'. Groups can be on separate lines at the same indentation, but groups on separate lines still must be separated by , in (…), […], or {…}. Parsing retains whether a term is formed by (…), […], {…}, or '…'.

group 1
[group 2 - subgroup I, group 2 - subgroup II,
group 2 - subgroup III,
(group 2 - subgroup IV - subsubgroup A,
  group 2 - subgroup IV - subsubgroup B,
  {group 2 - subgroup IV - subsubgroup C - subsubsubgroup α,
   group 2 - subgroup IV - subsubgroup C - subsubsubgroup β})]
'group 3 - subgroup I;  group 3 - subgroup II
group 3 - subgroup III'

The following three forms are not allowed, because they are missing a , between two groups:

// Not allowed

A , is disallowed if it would create an empty group, except that a trailing , is allowed.

// Not allowed

(, 1)

(1,, 2)

// Allowed, but not standard

(1, 2,)

A trailing , is only standard style when the closer that follows is on its own line.

list(
  red,
  green,
  blue,
  orange,
)

Using ' as both an opener and closer prevents simple nesting of those forms. There is no problem if a (, [, or {, appears between one ' as an opener and another ' as an opener; otherwise, two consecutive 's intended as openers would instead be parsed as an opener and a closer. To disambiguate, « can be used immediately after an opener ', and then » must be used just before the closing '. The « and » are not preserved in the parsed representation.

'a ('nested') b'
'«a 'nested' b»'

2.1.3 Blocking with : and Indentation🔗ℹ

A sequence of groups has a particular indentation that is determined by the first group in the sequence. Subsequent groups in a sequence must start with the same indentation as the first group.

group 1

group 2

// error, because the group is indented incorrectly:

group 3

When a line ends with : and the next line is more indented, then it starts a new sequence of groups that form a block:

group:
subgroup 1
subgroup 2

There is no constraint on how much indentation a nested group sequence must use, as long as the indentation is more than the enclosing group, but standard indentation is two spaces. Also, a new line is not required after :, but then it’s as if the : is followed by a newline plus spaces that reach the same column as the :. All four of the following groups are the same, each with one block that has two nested groups:

hello:
world
universe

hello:
       world
       universe

hello: world
       universe

hello:    world
          universe

Within an opener–closer pair, a nested group sequence can start at any indentation; it doesn’t have to be indented to the right of the opener.

function(
argument,
more
)

A block that is started with : normally cannot be empty (unless explicit-grouping «…» are used as described in Line- and Column-Insensitivity with « and »), so the following is ill-formed:

bad_empty: // empty block disallowed

However, : can be used at the start of a group so that the group contains only a block. When : starts a group that is in the top-level sequence or within an opener–closer pair, the block created by : is allowed to be empty (because that provides a way to express an empty block in a context where it likely to be intentional instead of confusing). For example, the first of the following three top-level groups has just a block that contains one group with the single element untagged, the second top-level group has just a block with zero groups, and the third has a group with one parenthesized sequence of groups where the middle one has an empty block:

: untagged

(1, :, 2)

2.1.4 Continuing with Indentation and an Operator🔗ℹ

When a newly indented line starts with an operator and when the preceding line does not end with :, then the indented line does not form a block, and it may instead continue the previous line. The operator-starting line continues only if the previous line was not a continuing line; however, additional continuing lines can start with an operator (not necessarily the same one) at the same indentation as the original continuing line. The following two groups are the same:

f(1) + 2
+ 3 + 4
- 5 - 6

f(1) + 2 + 3 + 4 - 5 - 6

An operator-starting line cannot continue a group that already has a block, because a block is always at the end of its immediately containing group or followed only by | alternatives:

hello: world

+ 3 // bad indentation

Along those lines, there is no ambiguity when an indented line appears after : and starts with an operator. In that case, the indented line is part of the block, since it cannot continue the group that contains the block. For example, the following two groups are the same, each with a block that has a + 3 group:

hello: + 3

hello:
+ 3

2.1.5 Alternatives with |🔗ℹ

A group can end with a sequence of alternatives, each of which starts with |. The initial | of the sequence can be on a new line, in which case it must have the same indentation as the beginning of its enclosing group, but it does not have to be on a new line. If a later | for the same alternative sequence starts a new line, it must be indented the same as the initial | (whether or not that initial | was on its own line). Each | is followed by a sequence of groups using the same indentation rules as the groups in a : block.

The following four groups are the same:

hello
| world
| universe

hello
| world
| universe

hello | world
      | universe

hello |
        world
      |
        universe

A group can start with an | alternative only when it is immediately within '…', […], or {…}, or if is the first group of the sequence immediately within '…'. Like :, the group-sequence content after | cannot be empty (unless explicit-grouping «…» are used immediately after |, as described in Line- and Column-Insensitivity with « and »).

If a | appears on the same line as an earlier | and is not more nested inside (…), […], or {…}, then the | terminates the earlier |’s content and continues its enclosing group with a new | alternative. The intent and consequence of this rule is that multiple |s can be used on a single line instead of starting each | on its own line, making the following groups the same as the above groups:

hello | world | universe

hello
| world | universe

A group can contain both a (single) : block and a sequence of | alternatives. The block’s content will be more nested relative to the | alternatives (or delimited with «…», as described in Line- and Column-Insensitivity with « and »).

hello:
in english
| world
| universe

A : block before a sequence of | alternatives can be empty. Such an empty : is not preserved in the parsed form (unless it uses «…», as described in Line- and Column-Insensitivity with « and »). In effect, a : is optional before | alternatives that start in a new line, but standard style omits an optional :. The following two groups are the same:

hello:
| world
| universe

hello
| world
| universe

When | appears after : on the same line, it is part of the : block (unless the block is delimited with «…», as described in Line- and Column-Insensitivity with « and »). The following two groups are the same:

hello: in english | world | universe

hello:
  in english
  | world
  | universe

2.1.6 Separating Groups with ; and ,🔗ℹ

A ; separates two groups on the same line. A ; is allowed in any context—except between groups immediately within, (…), […], or {…}, where a , separates groups. The following three blocks are the same:

hello:
  world
  universe

hello:
  world; universe

hello: world; universe

The ; and , separators interact differently with blocks formed by : and |. A , closes blocks as necessary to reach an enclosing (…), […], or {…}, while a ; separates groups within a nested group sequence. If ; would create an empty group, it is ignored.

For example, the following two groups are the same, and they have one parenthesized term that has a single block, and the block has two groups:

(hello: world; universe)

(hello: world
universe)

The following two groups are also the same, where the group has one parenthesized term, but that term contains two groups, where the first group contains a block that contains a single group:

(hello: world, universe)

(hello: world,
universe)

2.1.7 Line- and Column-Insensitivity with « and »🔗ℹ

See also How to Type «…».

A group sequence can be delimited explicitly with «…» to disable the use of line and column information for parsing between «…». A « can be used immediately after : or immediately after |, in which case a » indicates the end of the group sequence that starts after the : or |. Within the sequence, an explicit ; must be used to separate groups. A « can also be used immediately after ', and then » is used just before the closing ', but that is a different kind of «…» that is specific to supporting nested ' pairs and does not disable line and column sensitivity.

A sequence of groups, either at the top level or within a block, can be written without line and column sensitivity as ; followed immediately by «, in which case a » indicates the end of the sequence, and groups within the sequence are separated by ;. When parsing, the groups within the sequence are spliced into the enclosing context. The combination of ; and « is intended for entering line- and column-insensitive mode for a single group or for representing a sequence of groups that is not within a block.

Whitespace and /*…*/ comments are allowed between a :, |, or ; and its «, but in a line-sensitive context, the « must be on the same line as its :, |, or ;.

The following five groups are the same:

hello:
  if x
  | world
    planet
  | universe

hello: if x | world; planet | universe

hello:«
  if x
  |« world;
     planet »
  |« universe »»

hello:« if x |« world; planet » |« universe »»

;«hello
  :
  «
  if
  x
  |
  «
  world
  ;
  planet
  »
  |
  «
  universe
  »
  »

Using «…» can “armor” a shrubbery for transport from one context to another where its line breaks or indentation might get mangled. For example, an editor might offer an operation to armor a range of text in preparation for moving or copying the text, and then it can be properly indentend in its destination before unmarmoring. Along similar lines, when writing code as data to be read back later, it’s easy for a printer to insert explicit «…».

In rare cases, a programmer might write «…» directly. Although many shrubbery forms can be written with :, |, and ; on a single line, as illustrated above, not all forms can be collapsed to a single line without extra delimiters. For example, these six groups are all different:

outside:
  inside: fruit
  rind

// not the same, because `rind` is within `inside:`
outside: inside: fruit; rind

if true
| if false
  | x
  | y
| z

// not the same, because there's one block with five `|` alternatives
if | true | if false | x | y | z

hello:
  if x
  | world
  | universe
  the end

// not the same, because `the end` is in the second `|`:
hello: if x | world | universe; the end

Using «…» can help in those cases:

outside:
  inside: fruit
  rind

outside: inside:« fruit »; rind

if true
| if false
  | x
  | y
| z

if | true |« if false | x | y » | z

hello:
  if x
  | world
  | universe
  the end

hello: if x | world |« universe »; the end

Even so, delimiting blocks with «…» is expected to be rare in practice, both because programmers are likely to break things across lines and because a language that uses shrubbery notation is likely to allow (…) in places where grouping might be needed. For example, assuming that if is an expression form and (…) can wrap an expression, a nested conditional is probably better written like this:

if | true | (if false | x | y) | z

Using (…) in this way does not produce an equivalent shrubbery to

if | true |« if false | x | y »| z

but it might represent an equivalent expression in the language using shrubbery notation.

To stay consistent with blocks expressed through line breaks and indentation, a block with «…» must still appear at the end of its enclosing group or have only | alternatives afterward.

// not allowed, because a block must end a group

inside:« fruit » more

2.1.8 Continuing a Line with \🔗ℹ

As a last resort, \ can be used at the end of a line (optionally followed by whitespace and comments on the line) to continue the next line as it if were one line continuing with the next line. The \ itself does not appear in the parsed form. Within the same line, a \ can be followed only by whitespace and comments in line-sensitive mode (i.e., outside «…» that form a line-insensitive group).

A continuing \ does not affect the assignment of columns to positions on a subsequent line; that is, column counting starts again at 0 following a newline after \. The beginning of the group still determines the group’s indentation, even if the continuing line starts less indented. When no terms precede a \ within a group, the \ is effectively whitespace.

Lines containing only whitespace and (non-term) comments do not count as “the next line” even for \ continuations, so any number of whitespace and comment lines can appear between \ and the line that it continues.

this is \
the first group
\
this is the second group

this is a group \
with:
  a
  nested
  block

this is a group \
with (a,
      nested,
      list)

this is \
/* comment */
the last group

2.1.9 Group Comments with #//🔗ℹ

A #// comments out a group or | alternative. To comment out a group, #// must appear either on its own line before a group or at the start of a group. To comment out an alternative, #// must appear on its own line before the alternative or just before a | that does not start a new line.

The interaction between #// and indentation depends on how it is used:

When #// appears completely on its own line (possibly with whitespace and non-group comments), then its indentation does not matter. It comments out the next group or alternative—which might be a single-line group, multi-line group, or | alternative.
When #// appears at the start of a group with more tokens afterward on the same line, it determines that group’s indentation, and it must obey any constraints on the group’s indentation. When #// appears immediately after an opener but with nothing else afterward on the same line, it determines indentation for the groups immediately within the opener, and it comments out the first group.
When #// appears just before a | on the same line, then unlike the case for groups, it does not affect the the column of the | as used to align alternatives on later lines. Along those lines and to avoid an indentation mismatch, a #// is not allowed to start a line for commenting out a | alternative on the same line.

A #// is not allowed without a group or alternative afterward to comment out. Multiple #//s do not nest (i.e., two #//s in a row is always an error).

The following three groups all parse the same:

{
  hello:
    val x: f(1, 2 + 3)
    match x
    | 1: 'one'
    | 2: 'two'
}

{
  hello:
    val x:
      #//
      g(-1)
      f(
        #//
        0,
        1,
        2 + 3,
        #//
        4 + 5)
    #//
    not included in the code
    match x
    #//
    | 0: no
    | 1: 'one'
    #//
    | 1.5: no
    | 2: 'two'
    #//
    | 3: no,
  #//
  goodbye:
    the enclosing group of the block is commented out
}

{
  hello:
    val x:
      #// g(-1)
      f(#// 0, 1, 2 + 3, #// 4 + 5)
    #// not included in the code
    match x #// | 0: no | 1: 'one' #// | 1.5: no
                | 2: 'two' #// | 3: no,
  #// goodbye:
    the enclosing group of the block is commented out
}

2.1.10 At-Notation Using @🔗ℹ

Groups, blocks, and alternatives provide a convenient general notation for structured forms, such as typical elements of programming language. Basic shrubbery elements are less well suited for representing blocks of free-form text, however. String literals work well enough for simple text, but they’re awkward for representing multi-line paragraphs and interpolated text formatting.

To better support free-form text and escapes, shrubbery notation includes a text support that is based on at-exp notation for S-expressions. A @ in shrubbery notation starts a term that normally includes { and }, where @ changes the meaning of { and } to delimit free-form text instead of shrubbery groups. For example,

@typeset{Write "hello" to C:\greet.txt.}

is equivalent to

typeset(["Write \"hello\" to C:\\greet.txt."])

Note that text in {…} in this example did not need escapes for the literal quotes and backslashes. The conversion puts the literal string in a list, where the list has more elements in the case of multiline text or escapes.

Overall, @ notation has three key properties:

Within text { and }, nearly all content is literal, except that @ can be used again to escape. Longer paired delimiters, such as |<<{ and }>>|, imply a corresponding longer escape, such as |<<@, so that text notation itself embeds easily within another text context (as long as the outer context uses a distinct escape).
Between @ and the opening delimiter like {, optional additional terms are parsed as a normal shrubbery terms. Thus, a single @ notation is useful both for escaping to text in a shrubbery context, and escaping back to a shrubbery notation in a text context.
Every @ form can be translated to a shrubbery form without @. This transformation is performed automatically during shrubbery parsing, as opposed to leaving the translation to a language that is built on shrubbery notation.

Each input of the form

@ command ( arg , ... ) { text } ...

is parsed into the same representation as

command(arg, ..., [converted_text, ...], ...)

Each component of the original form—command, parenthesized args, and braced text—is optional, as long as one component is present, and as long as command is present before parenthesized args. The command and arg components are in shrubbery notation, while text is in text mode and converted to converted_text lists. The converted_text translation includes elements that are not string literals in places where text has escapes. An @ form can have multiple {…} text blocks, in which case the translation has multiple converted_text list arguments.

More examples:

@typeset(~style: bold){Write "hello"}
typeset(~style: bold, ["Write \"hello\""])

@typeset{Write @bold{"hello"}}
typeset(["Write ", bold(["\"hello\""])])

@typeset{Write @url{https://example.com}{"hello"}}
typeset(["Write ", url(["https://example.com"], ["\"hello\""])])

@typeset{Write @get_link(home_page) out...}
typeset(["Write ", get_link(home_page), " out ..."])

@typeset|{Example: @bold{"hello"}}|
typeset(["Example: @bold{\"hello\"}"])

Some additional @ rules:

When multiple lines of text are within {…}, then leading indentation common to all lines is discarded.
While the command component itself can be parenthesized, it can also have the form « command ... » for a multi-part command component that is spliced into the translation without surrounding parentheses.
A multi-part command that is a sequence of identifiers separated by operators (usually .) can be written without grouping «…», as long as no space appears between the identifiers and operators.
When ( arg , ... ) is present, the separating commas are optional. That is, arguments can be provided as different newline-separated groups without a , in between.
The form @(« command ... ») splices as-is with no arguments, even if the subsequent text has the shape of parenthesized args or braced text.
The @// comment form works both in normal shrubbery mode and as a comment escape within text.

See At-Notation Parsing for complete details.

1	Quick Overview
2	Shrubbery Specification
3	Parsed Representation
4	Shrubbery Language
5	Shrubbery APIs
6	Design Considerations
7	Editor Support

2.1.1	Grouping by Lines
2.1.2	Grouping by Opener–Closer Pairs
2.1.3	Blocking with : and Indentation
2.1.4	Continuing with Indentation and an Operator
2.1.5	Alternatives with \|
2.1.6	Separating Groups with ; and ,
2.1.7	Line- and Column-Insensitivity with « and »
2.1.8	Continuing a Line with \
2.1.9	Group Comments with #/ /
2.1.10	At-Notation Using @

‹group›	::=	‹term›+
‹term›	::=	‹atom›
	\|	( ‹group›* ) \| [ ‹group›* ] \| { ‹group›* } \| ' ‹group›* '
	\|	: ‹group›*
	\|	( \| ‹group›* )+

‹group›	::=	‹delit›* ‹block›? ‹alts›? — must be nonempty
‹delit›	::=	‹atom›
	\|	( ‹group›* ) \| [ ‹group›* ] \| { ‹group›* } \| ' ‹group›* '
‹block›	::=	: ‹group›*
‹alts›	::=	( \| ‹group›* )+