8.5 Binding Low-Level Protocol

8.18.0.13

8.5 Binding Low-Level Protocol🔗ℹ

Binding macros are typically written by expansion to existing forms as described in Binding and Annotation Macros, but bind.macro form also supports a low-level protocol. A macro opts into the low-level protocol by returning a result build with bind_meta.pack. To support patterns, repetitions, annotations with nested expressions, static information, let versus def, and more, the low-level binding protocol is considerably more complicated than other Rhombus extension contexts.

A binding form using the low-level protocol has five parts:

A compile-time function to report “upward” static information about the variables that it binds. The function receives “downward” information provided by the context, such as static information inferred for the right-hand side of a def binding or imposed by enclosing binding forms. If a binding form has subforms, it can query those subforms, pushing its down information “downward” and receiving the subform information “upward”.
A compile-time “oncer” function that generates definitions to be evaluated once each time the binding form is reached. Typically, one-time definitions wrap an expression within a binding form, such as expressions in Any.of associated to a binding with the :: operator; see Annotations from Predicates for more discussion. The matcher step (described next) can directly refer to definitions generated by the oncer step.
A compile-time “matcher” function that generates code that checks the input value for a match. The generated block may include additional definitions before branching or after branching toward success, but normally no bindings visible to code around the binding should be created at this step. Also, any action that commits to a binding match should be generated by the second or third compile-time functions (in the next bullets). A matcher communicates to the second or third function through identifiers that are explicitly designated as evidence for the match.
A compile-time “committer” function that generates intermediate definitions, possibly using evidence recorded by the matcher. These definitions happen only after the match is successful, if at all, and the bindings are visible only after the matching part of the expansion. These bindings are not affected by let, however, and so they should not include user-supplied names. Often, no intermediate definitions are needed, but they can be useful if some value needs to be computed once and referenced my multiple user-visible bindings (which does not work for definitions generated in the fourth function of the bindings are scoped to only later definition by let). When a binding form is used in a match-only position, such as within a matching annotation on the right-hand side of is_a, then the committer function is not used.
A compile-time “binder” function that generates definitions for the bound variables (i.e., the ones described by the function in the first bullet above), possibly using evidence values from the matcher, and possibly using bindings created by the committer. These definitions happen only after the match is successful and after intermediate bindings that need a successful match. These bindings are the ones that are affected by let. The generated definitions do not need to attach static information reported by the first bullet’s function; that information will be attached by the definition form that drives the expansion of binding forms. When a binding form is used in a match-only position, then the binder function is not used.

The first of these functions, which produces static information, is always called first, and its results might be useful to the last four. Operationally, parsing a binding form gets the first function, and then that function reports the other four along with the static information that it computes.

To make binding work both in definition contexts and match search contexts, the check-generating function (third bullet above) must be parameterized over the handling of branches. Toward that end, it receives three extra arguments: the name of an if-like form that we’ll call IF, a success form, and a failure form. The transformer uses the given IF to branch to a block that includes success or just failure. The IF form must be used in tail position with respect to the generated code, where the “then” part of an IF is still in tail position for nesting. The transformer must use failure and only failure in the “else” part of each IF, and it must use success exactly once within a “then” branch of one or more nested IFs.

Unfortunately, there’s one more complication. The result of a macro must be represented as syntax—even a binding macro—and a functions as a first-class compile-time value should not be used as syntax. (Such representation are sometimes called “3-D syntax,” and they’re best avoided.) So, a low-level binding macro must uses a defunctionalized representation of functions. That is, a parsed binding reports a function name for its static-information compile-time function, plus data to be passed to that function, and those two parts form a closure. Among the static-information function’s returns are names for a once-generator, match-generator, commit-generator, and binding-generator function, plus data to be passed to those functions (effectively: the fused closure for those four functions).

In full detail, a low-level parsed binding result from bind.macro transformer is represented as a syntax object with two parts:

The name of a compile-time function that is bound with bind.infoer.
Data for the bind.infoer-defined function, packaged as a single syntax object. This data might contain parsed versions of other binding forms, for example.

These two pieces are assembled into a parenthesized-tuple syntax object, and then packed with the bind_meta.pack function to turn it into a valid binding expansion (to distinguish the result from a macro expansion in the sense of producing another binding form).

The function bound with bind.infoer will receive two syntax objects: a representation of “downward” static information and the parsed binding’s data. The result must be a single-object tuple with the following parts:

A string that is used for reporting a failed match. The string is used as an annotation, and it should omit information that is local to the binding. For example, when List.cons(x, y) is used as a binding pattern, a suitable annotation string might be "matching(List.cons(_, _))" to phrase the binding constraint as an annotation and omit local variable names being bound (which should not be reported to the caller of a function, for example, when an argument value in a call of the function fails to match).
An identifier that is used as a name for the input value, at least to the degree that the input value uses an inferred name. For example, proc as a binding form should cause its right-hand value to use the inferred name proc, if it can make any use of an inferred name.
“Upward” static information associated with the overall value for a successful match with the binding. This information is used by the matching annotation operator, for example, as well as propagated outward by binding forms that correspond to composite data types. The information is independent of static information for individual names within the binding, but it should be the same as information for any binding that corresponds to the full matched value. For example, Posn(x, y) binds x and y, and it may not have any particular static information for x and y, but a matching value has static information suitable for Posn, anyway; so, using p :: matching(Posn(_, _)) makes p have Posn static information.
A list of individual names that are bound by the overall binding, a description of valid uses for each name (e.g., as an expression, as a repetition of some depth), and “upward” static information for each name. For example, Posn(x, y) as a binding pattern binds x and y as identifiers that can be used as expressions, and static information about x and y might come from annotations in the definition of Posn. The final transformer function described in the third bullet above is responsible for actually binding each name and associating static information with it, but this summary of binding enables cooperation with composite binding forms, so that those that create repetitions.
The name of a compile-time oncer function that is bound with bind.oncer.
The name of a compile-time matcher function that is bound with bind.matcher.
A tree of evidence identifiers that are defined by the matcher and whose values should be accessible to the committer and binder functions. In common cases, these identifiers are provided back to the committer and binder functions unchanged; in other cases, the committer and binder are not used in the scope of matcher definitions, but the values of the evidence variables are transferred to other variables that are reported to the committer and binder for their use, instead.
The name of a compile-time committer function that is bound with bind.committer.
The name of a compile-time binder function that is bound with bind.binder.
Data for the bind.matcher- and bind.binder-defined functions, packaged as a single syntax object.

The functions bound with bind.oncer, bind.matcher, bind.committer, and bind.binder are called with the syntax-object data from the sixth tuple slot. Before that argument, except for bind.oncer, the functions also receive an identifier for the matcher’s input. The match-building transformer in addition receives the IF form name, a success form, and a failure form. The committer and binder in between receive a tree of evidence identifiers, possibly the ones directly bound by the matcher, or possibly replacement identifiers (in the same shape as the original tree).

Here’s a use of the low-level protocol to implement a fruit pattern, which matches only things that are fruits according to is_fruit:

import:
  rhombus/meta open
bind.macro 'fruit($id)':
  bind_meta.pack('(fruit_infoer,
                   // remember the id:
                   $id)')
bind.infoer 'fruit_infoer($static_info, $id)':
  '("matching(fruit(_))",
    $id,
    // no overall static info:
    (),
    // `id` is bound,
    //  `~repet ()` means usable as repetition, and
    //  `()` means no static info:
    (($id, [~repet ()], ())),
    fruit_oncer,
    fruit_matcher,
    (), // no evidence needed
    fruit_committer,
    fruit_binder,
    // binder needs id:
    $id)'
bind.oncer 'fruit_oncer($id)':
  ''
bind.matcher 'fruit_matcher($arg, $id, $IF, $success, $failure)':
  '$IF is_fruit($arg)
   | $success
   | $failure'
bind.committer 'fruit_committer($arg, (), $id)':
  ''
bind.binder 'fruit_binder($arg, (), $id)':
  'def $id: $arg'
fun is_fruit(v):
  v == "apple" || v == "banana"

> def fruit(snack) = "apple"
> snack
"apple"
> def fruit(dessert) = "cookie"
def: value does not satisfy annotation
value: "cookie"
annotation: matching(fruit(_))

The fruit binding form assumes (without directly checking) that its argument is an identifier, and its infoer discards static information. Binding forms normally need to accommodate other, nested binding forms, instead. A bind.macro transformer with can receive already-parsed sub-bindings as arguments, and the infoer function can use bind_meta.get_info on a parsed binding form to call its internal infoer function. The result is packed static information, which can be unpacked into a tuple syntax object with bind_meta.unpack_info. Normally, bind_meta.get_info should be called only once to avoid exponential work with nested bindings, but bind_meta.unpack_info can used any number of times.

As an example, here’s an infix <&> operator that is similar to &&. It takes two bindings and makes sure a value can be matched to both. The binding forms on either size of <&> can bind variables. The <&> builder is responsible for binding the input name that each sub-binding expects before it deploys the corresponding builder. The only way to find out if a sub-binding matches is to call its builder, providing the same IF and failure that the original builder was given, and possibly extending the success form. A builder must be used in tail position, and it’s success position is a tail position.

bind.macro '$a <&> $b':
  bind_meta.pack('(anding_infoer,
                   ($a, $b))')
bind.infoer 'anding_infoer($static_info, ($a, $b))':
  let a_info = bind_meta.get_info(a, static_info)
  let b_info = bind_meta.get_info(b, static_info)
  def '($a_ann, $a_name, ($a_s_info, ...), ($a_var_info, ...),
        $_, $_, $a_evidence, $_, $_, $_)':
    bind_meta.unpack_info(a_info)
  let '($b_ann, $b_name, ($b_s_info, ...), ($b_var_info, ...),
        $_, $_, $b_evidence, $_, $_, $_)':
    bind_meta.unpack_info(b_info)
  let ann:
    "and("
      +& Syntax.unwrap(a_ann) +& ", " +& Syntax.unwrap(b_ann)
      +& ")"
  '($ann,
    $a_name,
    ($a_s_info, ..., $b_s_info, ...),
    ($a_var_info, ..., $b_var_info, ...),
    anding_oncer,
    anding_matcher,
    ($a_evidence, $b_evidence),
    anding_committer,
    anding_binder,
    ($a_info, $b_info))'
bind.oncer 'anding_oncer(($a_info, $b_info))':
  let '($_, $_, $_, $_,
        $a_oncer, $_, $_, $_, $_, $a_data)':
    bind_meta.unpack_info(a_info)
  let '($_, $_, $_, $_,
        $b_oncer, $_, $_, $_, $_, $b_data)':
    bind_meta.unpack_info(b_info)
  '$a_oncer($a_data)
   $b_oncer($b_data)'
bind.matcher 'anding_matcher($in_id, ($a_info, $b_info),
                             $IF, $success, $failure)':
  let '($_, $_, $_, $_,
        $_, $a_matcher, $_, $_, $_, $a_data)':
    bind_meta.unpack_info(a_info)
  let '($_, $_, $_, $_,
        $_, $b_matcher, $_, $_, $_, $b_data)':
    bind_meta.unpack_info(b_info)
  '$a_matcher($in_id, $a_data, $IF,
              $b_matcher($in_id, $b_data, $IF, $success, $failure),
              $failure)'
bind.committer 'anding_committer($in_id,
                                 ($a_evidence, $b_evidence),
                                 ($a_info, $b_info))':
  let '($_, $_, $_, $_,
        $_, $_, $_, $a_committer, $_, $a_data)':
    bind_meta.unpack_info(a_info)
  let '($_, $_, $_, $_,
        $_, $_, $_, $b_committer, $_, $b_data)':
    bind_meta.unpack_info(b_info)
  '$a_committer($in_id, $a_evidence, $a_data)
   $b_committer($in_id, $b_evidence, $b_data)'
bind.binder 'anding_binder($in_id,
                           ($a_evidence, $b_evidence),
                           ($a_info, $b_info))':
  let '($_, $_, $_, $_,
        $_, $_, $_, $_, $a_binder, $a_data)':
    bind_meta.unpack_info(a_info)
  let '($_, $_, $_, $_,
        $_, $_, $_, $_, $b_binder, $b_data)':
    bind_meta.unpack_info(b_info)
  '$a_binder($in_id, $a_evidence, $a_data)
   $b_binder($in_id, $b_evidence, $b_data)'

> def one <&> 1 = 1
> one
1
> def two <&> 1 = 2
def: value does not satisfy annotation
value: 2
annotation: and(Any, matching(1))

class Posn(x, y)

> def Posn(0, y) <&> Posn(x, 1) = Posn(0, 1)
> x
0
> y
1

One subtlety here is the syntactic category of IF for a builder call. The IF form might be a definition form, or it might be an expression form, and a builder is expected to work in either case, so a builder call’s category is the same as IF. An IF alternative is written as a block, as is a success form, but the block may be inlined into a definition context.

The <&> infoer is able to just combine any names and “upward” static information that receives from its argument bindings, and it can simply propagate “downward” static information. When a binding operator reflects a composite value with separate binding forms for component values, then upward and downward information needs to be adjusted accordingly.

1	Quick Start
2	Rhombus Essentials
3	Collections and Iteration
4	Classes and Interfaces
5	Input, Output, and Strings
6	Syntax Objects and Macros
7	Annotations
8	Static Information, Binding, and Annotation
9	Defining Languages
10	Building and Running Rhombus Programs
11	Using Racket Tools and Libraries

8.1	Static Information
8.2	Representing Static Information
8.3	Rules for Static Information
8.4	Annotation Low-Level Protocol
8.5	Binding Low-Level Protocol