pprint-compact: a non-greedy pretty printer
(require pprint-compact) | package: pprint-compact |
This library implements an optimal pretty printer inspired by Bernardy (2017) and Podkopaev and Boulytchev (2014). The pretty printer generates the most optimal textual document from a tree structure.
The library is similar to another pretty printer library pprint, but pprint implements a greedy printer and doesn’t support the choice operator (alt). In practice, this means pprint will be more efficient, but it lacks expressivity that this library provides, and could produce a non-optimal layout.
Unlike the canonical Haskell implementation which removes the choice operator to improve performance, we implement all constructs described in Bernardy (2017), with additional optimizations, constructs for practical printers, and a much better handling of overflow.
A part of this documentation and its structure are shamelessly copied/adapted from the PPrint library.
1 Getting Started
Here’s a simple example of pretty-printing a fragment of code.
> (define doc (v-append (text "while (true) {") (h-append (text " ") (v-append (text "f();") (alt (hs-append (text "if (done())") (text "exit();")) (v-append (text "if (done())") (h-append (text " ") (text "exit();")))))) (text "}"))) > (pretty-print doc)
while (true) {
f();
if (done()) exit();
}
With a different page width limit, the document could be printed differently:
> (pretty-print doc #:width 20)
while (true) {
f();
if (done())
exit();
}
2 Documents
Formatting text involves creating an “abstract document” or doc, which encapsulates formatting information for the pretty printer. The library functions (see Constructing Documents) build and combine docs, which can then be rendered for pretty printing (see Rendering Documents).
3 Best Practice for Document Construction
The arguments to alt should roughly have the same content, albeit with different formats. This means that the tree size of a doc containing alt tends to blow up exponentially. The time complexity of the algorithm used by this library however depends on the DAG size of the doc, so provided that sub-documents are sufficiently shared, the DAG size should be small enough to allow efficient pretty printing.
As an example, say we want to pretty print an S-expression with two possible layouts for each “list”: horizontal and vertical. That is,
(1 2 3)
could be rendered as itself or
(1 2 3)
We can construct a function to convert an S-expression to a doc:
> (define (pretty s) (match s [(list xs ...) ; Calculate all subdocuments first to share their references (define xs* (map pretty xs)) (alt (h-append lparen (hs-concat xs*) rparen) (h-append lparen (v-concat xs*) rparen))] [_ (text s)]))
And then pretty print it:
> (define doc (pretty '("1" "2" "3"))) > (pretty-print doc) (1 2 3)
> (pretty-print doc #:width 3)
(1
2
3)
The important point is that we reuse xs* across branches of alt. Had we call (map pretty xs) twice in branches of alt, both doc construction and pretty-print would be inefficient.
4 Library Documentation
4.1 Rendering Documents
procedure
(pretty-print d [ #:out out #:width width #:indent indent]) → void? d : doc? out : output-port? = (current-output-port) width : (or/c +inf.0 natural-number/c) = (current-page-width) indent : natural-number/c = (current-page-indent)
> (define prefix-s "value is: ")
> (begin (display prefix-s) (pretty-print (v-append (text "a") (text "b") (text "c")) #:indent (string-length prefix-s)))
value is: a
b
c
procedure
(pretty-format d [ #:width width #:indent indent]) → string? d : doc? width : (or/c +inf.0 natural-number/c) = (current-page-width) indent : natural-number/c = (current-page-indent)
4.2 Constructing Documents
See Best Practice for Document Construction for caveats of this construct.
4.3 Useful Constants
4.4 Parameters
parameter
(current-page-width) → (or/c natural-number/c +inf.0)
(current-page-width page-width) → void? page-width : (or/c natural-number/c +inf.0)
= 80
parameter
(current-page-indent indent) → void? indent : natural-number/c
= 0
4.5 Match Expanders
Internally, a doc is either a :text, :flush, :concat, :alternatives, :annotate, or :fail. We provide these match expanders to allow doc processing (see Processing Documents. The match expanders are illegal outside of the pattern position of the match form. Keep in mind that this list is unstable and could change across versions of the library.
syntax
(:text s)
syntax
(:flush d)
syntax
(:concat da db)
syntax
(:alternatives da db)
syntax
(:annotate d a)
syntax
(:fail)
5 Annotation
An annotation can be attached to a doc via annotate. An annotated doc is rendered like the corresponding unannotated doc, but various constructs can inspect an annotation for optimization or to change the rendering semantics. Unless you wish to write a function to process a doc (see Processing Documents), you can pretend that this feature doesn’t exist.
6 Processing Documents
If you wish to process a doc, it is highly recommended that your function that recurs over the doc structure is memoizing. This can be done by using memoize function in pprint-compact/memoize.
6.1 Memoization Library
(require pprint-compact/memoize) | package: pprint-compact |
procedure
(memoize f [#:backend backend]) → procedure?
f : (-> any/c any/c) backend : procedure? = hasheq
6.2 Document Processing Library
(require pprint-compact/process) | package: pprint-compact |
procedure
(doc-process f d) → doc?
f : procedure? d : doc?
Prefer this function over manual matching against all match expanders, since the list of match expanders could change across versions of this library, making the code brittle to changes. Using this function on the other hand makes doc processing stable across versions.
6.3 Tutorial: Implementing flat
This section will demonstrate how to perform document processing by implementing flat.
The goal of flat is to constrain a document to have only one line. This can be done by replacing every :flush with fail.
(define (my-flat d) (match d [(:flush) fail] [(:text s) d] [(:fail) d] [(:concat a b) (h-append (my-flat a) (my-flat b))] [(:alternatives a b) (alt (my-flat a) (my-flat b))] [(:annotate d a) (annotate (my-flat d) a)]))
While this code is functional, it is inefficient: the time complexity is linear to the tree size of the document, even though the DAG size could be much smaller.
We can improve the function by using memoization:
(define (my-flat d) (define loop (memoize (λ (d) (match d [(:flush) fail] [(:text s) d] [(:fail) d] [(:concat a b) (h-append (loop a) (loop b))] [(:alternatives a b) (alt (loop a) (loop b))] [(:annotate d a) (annotate (loop d) a)])))) (loop d))
However, this function is still subtly inefficient because when it recurs, it unconditionally constructs new objects even though there is no change. Moreover, the code is brittle as the list of match expanders could change across versions of this library.
We can solve both problems at once by using doc-process.
(define (my-flat d) (define loop (memoize (λ (d) (match d [(:flush) fail] [_ (doc-process loop d)])))) (loop d))
We can further optimize the function by noticing that if a doc fragment is already flat, then there is no need to recur further. Therefore, we add an annotation that the output doc is flat and avoid computation if the doc is already flat:
(define (my-flat d) (define loop (memoize (λ (d) (match d [(:flush) fail] [(:annotate _ 'flat) d] [_ (doc-process loop d)])))) (annotate (loop d) 'flat))
7 Debug
(require pprint-compact/debug) | package: pprint-compact |
The following functions can be used to measure document size.
procedure
(dag-size d) → natural-number/c
d : doc?
procedure
(tree-size d) → natural-number/c
d : doc?
8 Design Notes
For the history of pretty printer in general, see History in the pprint library.
Two recent algorithms are proposed in Bernardy (2017) and Podkopaev and Boulytchev (2014), which are more expressive and optimal at the cost of being less efficient, compared to other greedy pretty printers. The time complexity of the algorithm in Bernardy (2017) is O(n W4 log W), where n is the tree size and W is the page width (when using an efficient Pareto frontier computation), though in practice it is fairly fast. Similarly, the time complexity of the algorithm in Podkopaev and Boulytchev (2014) is O(n W4), where n is the tree size and W is the page width, though in practice it is fairly slow.
This library implements a novel algorithm inspired by Bernardy (2017) and Podkopaev and Boulytchev (2014). It uses various techniques to improve the time complexity to O(n W3 log W) where n is the DAG size, while making it fast in practice. It also adds features like fail and annotate, which are basis to implement constructs like flat efficiently.
9 Acknowledgment
I would like to give special thanks to Justin Pombrio for pointing me to Bernardy (2017) and countless advice.
Bibliography
Jean-Philippe Bernardy. A Pretty But Not Greedy Printer. 2017. https://jyp.github.io/pdf/Prettiest.pdf |
Anton Podkopaev and Dmitri Boulytchev. Polynomial-Time Optimal Pretty-Printing Combinators with Choice. 2014. https://oops.math.spbu.ru/papers/printer-combinators.pdf |