1 Overview
Tensors
Automatic Differentiation
Operator Extension
Deep Learning Functions
These pieces work together to form the whole toolkit. This overview provides a high level roadmap in how to understand this documentation and the accompanying code.
1.1 Tensors
A tensor is the fundamental data structure in deep learning. A tensor can be thought of an n-dimensional array where it is possible for n to be 0. When n is 0, the tensor is known as a scalar.
The easiest way to think about it is that a scalar is a single number and tensors are vectors of scalars or vectors of tensors.
Every tensor has a shape. The shape of a tensor is a list of n members where the ith member of the list is the size of the ith dimension of the tensor. For scalars, the shape is the empty list.
scalar? - Tensors of rank 0
tensor? - Tensors of rank 0 or higher
shape? - The type (listof natural?) signifies the shape of a tensor.
1.2 Automatic Differentiation
Malt provides a simple reverse-mode automatic differentiation mechanism that is based on the concept of duals. A dual carries the tensor result of a function along with a link which encodes the chain of operations that produced the tensor result. This allows the gradient of the function that produced the tensor result to be computed.
Duals are automatically constructed when differentiable primitive operators (also provided by Malt) are used.
For interoperability, numerical constants are also considered to be duals with an empty link (known as end-of-chain).
Duals and tensors can contain each other, depending upon the representation. Malt provides three representations of tensors in increasing order of complexity and efficiency.
learner - This representation is the simplest. Tensors are implemented as nested vectors and all scalars are duals. This is the representation that follows the pedagogy of The Little Learner.
nested-tensors - This representation is a little more involved. Both tensors and scalars are duals. Tensors are implemented as nested vectors, but they may not contain duals. Unlike in the learner representation, here the links are associated directly with the tensor as opposed to each scalar in the tensor. Because of this, automatic differentiation is more efficient when compared to learner. This representation is described in detail in Appendix B. I could have raced all day of The Little Learner.
flat-tensors - This representation is the most efficient among the three. Both tensors and scalars are duals. Tensors are implemented as flat vectors (similar to how arrays are implemented in C or Fortran), but they also may not contain duals. Here as well, the links are associated directly with the tensor as opposed to each scalar in the tensor. The flat organization of the tensors makes this the most efficient representation among the three. This representation is described in brief in Appendix B. I could have raced all day of The Little Learner.
The default representation for tensors in Malt is learner. The Malt source repository can be configured and recompiled to choose different tensor representations in order to experiment with them. To set a specific implementation, see Setting tensor implementations.
dual? - Duals
- link? - Links included in a dual. Defined as the type
(-> dual? tensor? gradient-state? gradient-state?) gradient-state? - A hashtable from dual? to tensor?
differentiable? - Either a dual?, or a (listof differentiable?). In the learner representation (vectorof differentiable?) is also considered to be differentiable?, but not in other representations.
1.3 Operator Extension
The simple recursive structure of tensors allows commonly used numerical primitives to be extended to produce what are known as pointwise extensions. These are also known as broadcast operations over arrays. Malt provides an additional ability to pause the extension at a certain rank. So, rather than go all the way to the scalars in the array, the extension can stop at one of the higher dimensions. This allows the construction of polynomial-complexity functions by composing extensions.
Additionally, these extended primitives are automatically differentiable and functions built by composing these primitives can also be automatically differentiated (within limits of differentiablity of the function).
Section Differentiable extended numerical functions lists the primitives provided by Malt. Malt also provides tools to build extended versions of user defined functions. The type signatures of these tools are specific to the representation of tensors described above.
primitive-1? - A unary non-extended primitive.
primitive-2? - A binary non-extended primitive.
1.4 Deep Learning Functions
Building on top of tensors and automatic differentiation, Malt provides a collection of deep learning specific functions – loss functions, layer functions, gradient descent, compositional mechanisms, hyperparameters, etc.
theta? - A list of tensors which forms a parameter set.
1.5 Summary of Types
The following types are used primarily in the description of functions, but some types are marked as "virtual", in the sense that predicates for the type are not defined, but their intent is clear.
scalar? - Tensors of rank 0
tensor? - Tensors of rank 0 or higher
shape? - The type (listof natural?) signifies the shape of a tensor. (virtual)
dual? - Duals
- link? - Links included in a dual. Defined as the type
(-> dual? tensor? gradient-state? gradient-state?) gradient-state? - A hashtable from dual? to tensor?
differentiable? - Either a dual?, or a (listof differentiable?). In the learner representation (vectorof differentiable?) is also considered to be differentiable?, but not in other representations.
primitive-1? - A unary non-extended primitive. (virtual)
primitive-2? - A binary non-extended primitive. (virtual)
theta? - A list of tensors which forms a parameter set. (virtual)