0.5.91
3 High-level Corpus Functionality🔗ℹ
The bindings documented in this section are provided
by ricoeur/tei, but not
by ricoeur/tei/base.
Many applications work with entire collections
of TEI documents at least as often as with
individual documents.
This library provides corpus objects
(instances of corpus% or a subclass)
to bundle collections of TEI documents with
related functionality.
The corpus object system is also the
primary hook for tools to integrate with
the larger Digital Ricœur application architecture.
3.1 Working with Corpus Objects🔗ℹ
In practice, this parameter should usually be initialized
with a directory-corpus% instance.
Note that the returned instance set does not
contain the TEI document values with which the corpus
was created.
Corpus objects generally avoid retaining their
encapsulated TEI document values after initialization.
Currently, the result of (get-instance-info-set)
always satisfies (instance-set/c plain-instance-info?),
but that is not guaranteed to be true in future versions
of this library.
For each TEI document doc, the returned hash table
will have a key of (instance-title/symbol doc)
mapped to the value (tei-document-checksum doc).
Thus, any two corpus objects that return equal?
hash tables, even across runs of the program, are guaranteed
to encapsulate the very same TEI documents.
3.2 Creating Corpus Objects🔗ℹ
Note that creating a new instance of
corpus%
often involves a fair amount of overhead,
so creating redundant values should be avoided.
Reusing
corpus objects may also improve
search performance through caching, for example.
This method is final, so it cannot be overridden.
This method is final, so it cannot be overridden.
With empty-corpus, get-instance-info-set
always returns (instance-set) and
get-checksum-table always returns #hasheq().
Constructs a
corpus object from every file in
path,
including recursive subdirectories, that is recognized by
xml-path?.
If any such file is not a valid and well-formed TEI XML file
satisfying Digital Ricœur’s specification,
it will be silently ignored.
If more than one of the resulting
TEI document values
correspond to the same
instance,
one will be chosen in an unspecified manner and the others
will be silently ignored.
If path is a relative path, it is resolved relative to
(current-directory).
3.3 Deriving New Corpus Classes🔗ℹ
Clients of this library will want to extend the
corpus object
system to support additional features by implementing
new classes derived from
corpus%.
There are two main points where derived classes will want to interpose
on
corpus%’s initialization:
A few classes, like directory-corpus%,
will want to supply an alternate means of constructing
the full instance set of TEI documents
to be encapsulated by the corpus object.
This is easily done using standard features of the racket/class
object system, such as init and super-new, to control the
initialization of the base class.
More often, derived classes will want to use the complete
instance set of TEI documents to initialize some extended functionality:
for example, corpus% itself extends a primitive, unexported class this way
to initialize a searchable document set.
The ricoeur/tei library provides special support
for these kinds of extensions through three syntactic forms:
corpus-mixin, corpus-mixin+interface,
and define-corpus-mixin+interface.
Most clients should use define-corpus-mixin+interface,
but it is best understood as an extension of the simpler forms.
Most clients should use the higher-level corpus-mixin+interface
or define-corpus-mixin+interface, rather than using corpus-mixin
directly.
A key design consideration is that a corpus% instance does
not keep its TEI documents reachable after its initialization,
as TEI document values can be rather large.
Derived classes are urged to follow this practice:
they should initialize whatever state they need for their extended functionality,
but they should allow the TEI documents to be garbage-collected
as soon as possible.
Concretely, this means that corpus% does not store
the instance set of TEI documents in a
field
(neither public nor private), as objects’ fields are reachable after initialization.
Instead, derived classes can access the instance set of TEI documents
during initialization using super-docs or super-docs-evt:
Examples:
|
> (new (printing-corpus-mixin corpus%)) |
|
(wrapper-object:printing-corpus-mixin ...) |
|
|
interface-decl | | = | | (interface (super<%> ...) | interface-method-clause ...) |
| | | | | | (interface* (super<%> ...) | ([prop-expr val-expr] ...) | interface-method-clause ...) |
| | | | | | interface-method-clause | | = | | method-id | | | | | | [method-id contract-expr] |
|
|
|
Like
corpus-mixin, but evaluates to two values,
a mixin and an assosciated interface.
...
Most clients should use the higher-level define-corpus-mixin+interface,
rather than using corpus-mixin+interface directly.
|
|
name-spec | | = | | base-id | | | | | | [id-mixin id<%>] | | | | | | interface-decl* | | = | | (interface (super<%> ...) | interface-method-clause* ...) |
| | | | | | (interface* (super<%> ...) | ([prop-expr val-expr] ...) | interface-method-clause* ...) |
| | | | | | interface-method-clause* | | = | | interface-method-clause | | | | | | ext-method-clause | | | | | | interface-method-clause | | = | | method-id | | | | | | [method-id contract-expr] | | | | | | ext-method-clause | | = | | [ext-clause-part ...] | | | | | | ext-clause-part | | = | | method-definition-form ; required | | | | | | #:contract contract-expr | | | | | | #:proc proc-id | | | | | | with-current-decl | | | | | | method-definition-form | | = | | (define/method (method-id kw-formal ...) | body ...+) |
| | | | | | define/method | | = | | define/public | | | | | | define/pubment | | | | | | define/public-final | | | | | | with-current-decl | | = | | #:with-current with-current-id | #:else [else-body ...+] |
| | | | | | #:with-current/infer | #:else [else-body ...+] |
|
|
|
|
If no ext-method-clause appears,
equivalent to:
The
ext-method-clause variant extends the grammar of
interface
and
interface* to support defining functions related to
one of the interface’s methods: