5.4 Binding Low-Level Protocol
A binding form using the low-level protocol has four parts:
A compile-time function to report “upward” static information about the variables that it binds. The function receives “downward” information provided by the context, such as static information inferred for the right-hand side of a def binding or imposed by enclosing binding forms. If a binding form has subforms, it can query those subforms, pushing its down information “downward” and receiving the subform information “upward”.
A compile-time “matcher” function that generates code that checks the input value for a match. The generated block may include additional definitions before branching or after branching toward success, but normally no bindings visible to code around the binding should be created at this step. Also, any action that commits to a binding match should be generated by the second or third compile-time functions (in the next bullets).
A compile-time “committer” function that generates intermediate definitions. These definitions happen only after the match is successful, if at all, and the bindings are visible only after the matching part of the expansion. These bindings are not affected by let, however, and so they should not include user-supplied names. Often, no intermediate definitions are needed, but they can be useful if some value needs to be computed once and referenced my multiple user-visible bindings (which does not work for definitions generated in the fourth function of the bindings are scoped to only later definition by let). When a binding form is used in a match-only position, such as within a matching annotation on the right-hand side of is_a, then the committer function is not used.
A compile-time “binder” function that generates definitions for the bound variables (i.e., the ones described by the function in the first bullet above). These definitions happen only after the match is successful and after intermediate bindings that need a successful match. These bindings are the ones that are affected by let. The generated definitions do not need to attach static information reported by the first bullet’s function; that information will be attached by the definition form that drives the expansion of binding forms. When a binding form is used in a match-only position, then the binder function is not used.
The first of these functions, which produces static information, is always called first, and its results might be useful to the last three. Operationally, parsing a binding form gets the first function, and then that function reports the other three along with the static information that it computes.
To make binding work both in definition contexts and match search contexts, the check-generating function (second bullet above) must be parameterized over the handling of branches. Toward that end, it receives three extra arguments: the name of an if-like form that we’ll call IF, a success form, and a failure form. The transformer uses the given IF to branch to a block that includes success or just failure. The IF form must be used in tail position with respect to the generated code, where the “then” part of an IF is still in tail position for nesting. The transformer must use failure and only failure in the “else” part of each IF, and it must use success exactly once within a “then” branch of one or more nested IFs.
Unfortunately, there’s one more complication. The result of a macro
must be represented as syntax—
In full detail, a low-level parsed binding result from bind.macro transformer is represented as a syntax object with two parts:
The name of a compile-time function that is bound with bind.infoer.
Data for the bind.infoer-defined function, packaged as a single syntax object. This data might contain parsed versions of other binding forms, for example.
These two pieces are assembled into a parenthesized-tuple syntax object, and then packed with the bind_meta.pack function to turn it into a valid binding expansion (to distinguish the result from a macro expansion in the sense of producing another binding form).
The function bound with bind.infoer will receive two syntax objects: a representation of “downward” static information and the parsed binding’s data. The result must be a single-object tuple with the following parts:
A string that is used for reporting a failed match. The string is used as an annotation, and it should omit information that is local to the binding. For example, when List.cons(x, y) is used as a binding pattern, a suitable annotation string might be "matching(List.cons(_, _))" to phrase the binding constraint as an annotation and omit local variable names being bound (which should not be reported to the caller of a function, for example, when an argument value in a call of the function fails to match).
An identifier that is used as a name for the input value, at least to the degree that the input value uses an inferred name. For example, proc as a binding form should cause its right-hand value to use the inferred name proc, if it can make any use of an inferred name.
“Upward” static information associated with the overall value for a successful match with the binding. This information is used by the matching annotation operator, for example, as well as propagated outward by binding forms that correspond to composite data types. The information is independent of static information for individual names within the binding, but it should be the same as information for any binding that corresponds to the full matched value. For example, Posn(x, y) binds x and y, and it may not have any particular static information for x and y, but a matching value has static information suitable for Posn, anyway; so, using p :: matching(Posn(_, _)) makes p have Posn static information.
A list of individual names that are bound by the overall binding, a description of valid uses for each name (e.g., as an expression, as a repetition of some depth), and “upward” static information for each name. For example, Posn(x, y) as a binding pattern binds x and y as identifiers that can be used as expressions, and static information about x and y might come from annotations in the definition of Posn. The final transformer function described in the third bullet above is responsible for actually binding each name and associating static information with it, but this summary of binding enables cooperation with composite binding forms, so that those that create repetitions.
The name of a compile-time function that is bound with bind.matcher.
The name of a compile-time function that is bound with bind.binder.
The name of a compile-time function that is bound with bind.committer.
Data for the bind.matcher- and bind.binder-defined functions, packaged as a single syntax object.
The functions bound with bind.matcher, bind.committer, and bind.binder are called with a syntax-object identifier for the matcher’s input plus the data from the sixth tuple slot. The match-building transformer in addition receives the IF form name, a success form, and a failure form.
Here’s a use of the low-level protocol to implement a fruit pattern, which matches only things that are fruits according to is_fruit:
bind.macro 'fruit($id)':
bind_meta.pack('(fruit_infoer,
// remember the id:
$id)')
bind.infoer 'fruit_infoer($static_info, $id)':
'("matching(fruit(_))",
$id,
// no overall static info:
(),
// `id` is bound,
// `~repet ()` means usable as repetition, and
// `()` means no static info:
(($id, [~repet ()], ())),
fruit_matcher,
fruit_committer,
fruit_binder,
// binder needs id:
$id)'
bind.matcher 'fruit_matcher($arg, $id, $IF, $success, $failure)':
| $success
| $failure'
bind.binder 'fruit_committer($arg, $id)':
''
bind.binder 'fruit_binder($arg, $id)':
fun is_fruit(v):
> snack
"apple"
def: value does not satisfy annotation
value: "cookie"
annotation: matching(fruit(_))
The fruit binding form assumes (without directly checking) that its argument is an identifier, and its infoer discards static information. Binding forms normally need to accommodate other, nested binding forms, instead. A bind.macro transformer with can receive already-parsed sub-bindings as arguments, and the infoer function can use bind_meta.get_info on a parsed binding form to call its internal infoer function. The result is packed static information, which can be unpacked into a tuple syntax object with bind_meta.unpack_info. Normally, bind_meta.get_info should be called only once to avoid exponential work with nested bindings, but bind_meta.unpack_info can used any number of times.
As an example, here’s an infix <&> operator that is similar to &&. It takes two bindings and makes sure a value can be matched to both. The binding forms on either size of <&> can bind variables. The <&> builder is responsible for binding the input name that each sub-binding expects before it deploys the corresponding builder. The only way to find out if a sub-binding matches is to call its builder, providing the same IF and failure that the original builder was given, and possibly extending the success form. A builder must be used in tail position, and it’s success position is a tail position.
bind.macro '$a <&> $b':
bind_meta.pack('(anding_infoer,
bind.infoer 'anding_infoer($static_info, ($a, $b))':
let a_info = bind_meta.get_info(a, static_info)
let b_info = bind_meta.get_info(b, static_info)
def '($a_ann, $a_name, ($a_s_info, ...), ($a_var_info, ...),
bind_meta.unpack_info(a_info)
let '($b_ann, $b_name, ($b_s_info, ...), ($b_var_info, ...),
bind_meta.unpack_info(b_info)
let ann:
"and("
+& Syntax.unwrap(a_ann) +& ", " +& Syntax.unwrap(b_ann)
+& ")"
'($ann,
$a_name,
anding_matcher,
anding_committer,
anding_binder,
bind.matcher 'anding_matcher($in_id, ($a_info, $b_info),
bind_meta.unpack_info(a_info)
bind_meta.unpack_info(b_info)
$failure)'
bind.committer 'anding_committer($in_id, ($a_info, $b_info))':
bind_meta.unpack_info(a_info)
bind_meta.unpack_info(b_info)
bind.binder 'anding_binder($in_id, ($a_info, $b_info))':
bind_meta.unpack_info(a_info)
bind_meta.unpack_info(b_info)
> one
1
def: value does not satisfy annotation
value: 2
annotation: and(Any, matching(1))
class Posn(x, y)
One subtlety here is the syntactic category of IF for a builder call. The IF form might be a definition form, or it might be an expression form, and a builder is expected to work in either case, so a builder call’s category is the same as IF. An IF alternative is written as a block, as is a success form, but the block may be inlined into a definition context.
The <&> infoer is able to just combine any names and “upward” static information that receives from its argument bindings, and it can simply propagate “downward” static information. When a binding operator reflects a composite value with separate binding forms for component values, then upward and downward information needs to be adjusted accordingly.