pprint-compact: a non-greedy pretty printer🔗ℹ

Sorawee Porncharoenwase <sorawee.pwase@gmail.com>

This library implements an optimal pretty printer inspired by Bernardy (2017) and Podkopaev and Boulytchev (2014). The pretty printer generates the most optimal textual document from a tree structure.

The library is similar to another pretty printer library pprint, but pprint implements a greedy printer and doesn’t support the choice operator (alt). In practice, this means pprint will be more efficient, but it lacks expressivity that this library provides, and could produce a non-optimal layout.

Unlike the canonical Haskell implementation which removes the choice operator to improve performance, we implement all constructs described in Bernardy (2017), with additional optimizations, constructs for practical printers, and a much better handling of overflow.

A part of this documentation and its structure are shamelessly copied/adapted from the PPrint library.

1 Getting Started

2 Documents

3 Best Practice for Document Construction

4 Library Documentation

4.1 Rendering Documents

4.2 Constructing Documents

6 Processing Documents

6.1 Memoization Library

6.2 Document Processing Library

6.3 Tutorial: Implementing flat

1 Getting Started🔗ℹ

Here’s a simple example of pretty-printing a fragment of code.

Examples:

> (define doc
    (v-append
      (text "while (true) {")
      (h-append (text "    ")
                (v-append (text "f();")
                          (alt (hs-append (text "if (done())")
                                          (text "exit();"))
                               (v-append (text "if (done())")
                                         (h-append (text "    ")
                                                   (text "exit();"))))))
      (text "}")))
> (pretty-print doc)
while (true) {
    f();
    if (done()) exit();
}

With a different page width limit, the document could be printed differently:

Example:

> (pretty-print doc #:width 20)
while (true) {
    f();
    if (done())
        exit();
}

2 Documents🔗ℹ

Formatting text involves creating an “abstract document” or doc, which encapsulates formatting information for the pretty printer. The library functions (see Constructing Documents) build and combine docs, which can then be rendered for pretty printing (see Rendering Documents).

procedure
(doc? x) → boolean?
x : any/c

Determines whether x is a member of the doc datatype.

3 Best Practice for Document Construction🔗ℹ

The arguments to alt should roughly have the same content, albeit with different formats. This means that the tree size of a doc containing alt tends to blow up exponentially. The time complexity of the algorithm used by this library however depends on the DAG size of the doc, so provided that sub-documents are sufficiently shared, the DAG size should be small enough to allow efficient pretty printing.

As an example, say we want to pretty print an S-expression with two possible layouts for each “list”: horizontal and vertical. That is,

(1 2 3)

could be rendered as itself or

(1
2
3)

We can construct a function to convert an S-expression to a doc:

> (define (pretty s)
    (match s
      [(list xs ...)
       ; Calculate all subdocuments first to share their references
       (define xs* (map pretty xs))

       (alt (h-append lparen (hs-concat xs*) rparen)
            (h-append lparen (v-concat xs*) rparen))]
      [_ (text s)]))

And then pretty print it:

> (define doc (pretty '("1" "2" "3")))
> (pretty-print doc)
(1 2 3)
> (pretty-print doc #:width 3)
(1
2
3)

The important point is that we reuse xs* across branches of alt. Had we call (map pretty xs) twice in branches of alt, both doc construction and pretty-print would be inefficient.

4 Library Documentation🔗ℹ

4.1 Rendering Documents🔗ℹ

procedure
(pretty-print d
[ #:out out
#:width width
#:indent indent]) → void?
  d : doc?
  out : output-port? = (current-output-port)
  width : (or/c +inf.0 natural-number/c) = (current-page-width)
  indent : natural-number/c = (current-page-indent)

Pretty prints the doc d to the output port out with a maximum page width of width and an initial indentation level for subsequent lines of indent.

Examples:

> (define prefix-s "value is: ")
> (begin
    (display prefix-s)
    (pretty-print (v-append (text "a") (text "b") (text "c"))
                  #:indent (string-length prefix-s)))
value is: a
          b
          c

procedure
(pretty-format d
[ #:width width
#:indent indent]) → string?
  d : doc?
  width : (or/c +inf.0 natural-number/c) = (current-page-width)
  indent : natural-number/c = (current-page-indent)

Pretty prints the doc d as a string with a maximum page width of width.

4.2 Constructing Documents🔗ℹ

procedure
(text s) → doc?
s : string?

Constructs a doc containing the fixed string s. s must not contain a newline character.

procedure
(flush d) → doc?
d : doc?

Constructs a doc which is like d but with a newline at the end.

procedure
(alt x ...) → doc?
x : doc?

Constructs a doc which is rendered to one of xs, whichever resulting in the prettiest layout for the whole document. If given no arguments, the resulting doc is fail.

See Best Practice for Document Construction for caveats of this construct.

procedure
(h-append x ...) → doc?
x : doc?

Concatenates doc xs horizontally.

procedure
(h-concat xs) → doc?
xs : (listof doc?)

Concatenates docs in xs horizontally.

procedure
(v-append x ...) → doc?
x : doc?

Concatenates doc xs vertically.

procedure
(v-concat xs) → doc?
xs : (listof doc?)

Concatenates docs in xs vertically.

procedure
(hs-append x ...) → doc?
x : doc?

Concatenates doc xs horizontally with successive pairs separated by space.

procedure
(hs-concat x ...) → doc?
x : doc?

Concatenates docs in xs horizontally with successive pairs separated by space.

procedure
(sep xs) → doc?
xs : doc?

Concatenates docs in xs either (1) vertically or (2) horizontally with successive pairs separated by space.

value
fail : doc?

Constructs a doc that fails to render. This doc interacts with alt: only non-failing branches are considered for rendering.

procedure
(annotate d a) → doc?
d : doc?
a : any/c

Constructs a doc which is like d but with an annotation a.

procedure
(flat x) → doc?
x : doc?

Constrains doc x to fit in one line. If x can’t fit in one line, it fails to render.

procedure
(flush-if b d) → doc?
b : any/c
d : doc?

Performs flush on d if b is true. Otherwise, simply returns d.

4.3 Useful Constants🔗ℹ

value
empty-doc : doc?

Same as (text "")

value
lparen : doc?

Same as (text "(")

value
rparen : doc?

Same as (text ")")

value
space : doc?

Same as (text " ")

4.4 Parameters🔗ℹ

parameter
(current-page-width) → (or/c natural-number/c +inf.0)
(current-page-width page-width) → void?
page-width : (or/c natural-number/c +inf.0)
= 80

A parameter that determines the page width for pretty printing.

parameter
(current-page-indent) → natural-number/c
(current-page-indent indent) → void?
indent : natural-number/c
= 0

A parameter that determines the initial indentation level for subsequent lines.

4.5 Match Expanders🔗ℹ

Internally, a doc is either a :text, :flush, :concat, :alternatives, :annotate, or :fail. We provide these match expanders to allow doc processing (see Processing Documents. The match expanders are illegal outside of the pattern position of the match form. Keep in mind that this list is unstable and could change across versions of the library.

syntax
(:text s)

A match expander that recognizes text s of type string?.

syntax
(:flush d)

A match expander that recognizes a doc d with a newline at the end.

syntax
(:concat da db)

A match expander that recognizes a horizontal concatenation of docs da and db.

syntax
(:alternatives da db)

A match expander that recognizes two choices: docs da and db.

syntax
(:annotate d a)

A match expander that recognizes a doc d with an annotation value a.

syntax
(:fail)

A match expander that recognizes a failing doc.

5 Annotation🔗ℹ

An annotation can be attached to a doc via annotate. An annotated doc is rendered like the corresponding unannotated doc, but various constructs can inspect an annotation for optimization or to change the rendering semantics. Unless you wish to write a function to process a doc (see Processing Documents), you can pretend that this feature doesn’t exist.

6 Processing Documents🔗ℹ

If you wish to process a doc, it is highly recommended that your function that recurs over the doc structure is memoizing. This can be done by using memoize function in pprint-compact/memoize.

6.1 Memoization Library🔗ℹ

(require pprint-compact/memoize)

package: pprint-compact

procedure
(memoize f [#:backend backend]) → procedure?
f : (-> any/c any/c)
backend : procedure? = hasheq

Memoizes the function f based on the backend data structure.

6.2 Document Processing Library🔗ℹ

(require pprint-compact/process)

package: pprint-compact

procedure
(doc-process f d) → doc?
f : procedure?
d : doc?

Calls f on the immediate subdocuments of d and reassembles the results back. The function attempts to avoid creating new objects as best as it can. Note that f should be memoized.

Prefer this function over manual matching against all match expanders, since the list of match expanders could change across versions of this library, making the code brittle to changes. Using this function on the other hand makes doc processing stable across versions.

6.3 Tutorial: Implementing flat🔗ℹ

This section will demonstrate how to perform document processing by implementing flat.

The goal of flat is to constrain a document to have only one line. This can be done by replacing every :flush with fail.

(define (my-flat d)
  (match d
    [(:flush) fail]
    [(:text s) d]
    [(:fail) d]
    [(:concat a b) (h-append (my-flat a) (my-flat b))]
    [(:alternatives a b) (alt (my-flat a) (my-flat b))]
    [(:annotate d a) (annotate (my-flat d) a)]))

While this code is functional, it is inefficient: the time complexity is linear to the tree size of the document, even though the DAG size could be much smaller.

We can improve the function by using memoization:

(define (my-flat d)
  (define loop
    (memoize
      (λ (d)
        (match d
          [(:flush) fail]
          [(:text s) d]
          [(:fail) d]
          [(:concat a b) (h-append (loop a) (loop b))]
          [(:alternatives a b) (alt (loop a) (loop b))]
          [(:annotate d a) (annotate (loop d) a)]))))
  (loop d))

However, this function is still subtly inefficient because when it recurs, it unconditionally constructs new objects even though there is no change. Moreover, the code is brittle as the list of match expanders could change across versions of this library.

We can solve both problems at once by using doc-process.

(define (my-flat d)
  (define loop
    (memoize
      (λ (d)
        (match d
          [(:flush) fail]
          [_ (doc-process loop d)]))))
  (loop d))

We can further optimize the function by noticing that if a doc fragment is already flat, then there is no need to recur further. Therefore, we add an annotation that the output doc is flat and avoid computation if the doc is already flat:

(define (my-flat d)
  (define loop
    (memoize
      (λ (d)
        (match d
          [(:flush) fail]
          [(:annotate _ 'flat) d]
          [_ (doc-process loop d)]))))
  (annotate (loop d) 'flat))

7 Debug🔗ℹ

(require pprint-compact/debug)

package: pprint-compact

The following functions can be used to measure document size.

procedure
(dag-size d) → natural-number/c
d : doc?

Returns the DAG size of doc d.

procedure
(tree-size d) → natural-number/c
d : doc?

Returns the tree size of doc d.

8 Design Notes🔗ℹ

For the history of pretty printer in general, see History in the pprint library.

Two recent algorithms are proposed in Bernardy (2017) and Podkopaev and Boulytchev (2014), which are more expressive and optimal at the cost of being less efficient, compared to other greedy pretty printers. The time complexity of the algorithm in Bernardy (2017) is O(n W4 log W), where n is the tree size and W is the page width (when using an efficient Pareto frontier computation), though in practice it is fairly fast. Similarly, the time complexity of the algorithm in Podkopaev and Boulytchev (2014) is O(n W4), where n is the tree size and W is the page width, though in practice it is fairly slow.

This library implements a novel algorithm inspired by Bernardy (2017) and Podkopaev and Boulytchev (2014). It uses various techniques to improve the time complexity to O(n W3 log W) where n is the DAG size, while making it fast in practice. It also adds features like fail and annotate, which are basis to implement constructs like flat efficiently.

9 Acknowledgment🔗ℹ

I would like to give special thanks to Justin Pombrio for pointing me to Bernardy (2017) and countless advice.

Bibliography🔗ℹ

Jean-Philippe Bernardy. A Pretty But Not Greedy Printer. 2017. https://jyp.github.io/pdf/Prettiest.pdf

Anton Podkopaev and Dmitri Boulytchev. Polynomial-Time Optimal Pretty-Printing Combinators with Choice. 2014. https://oops.math.spbu.ru/papers/printer-combinators.pdf

1	Getting Started
2	Documents
3	Best Practice for Document Construction
4	Library Documentation
5	Annotation
6	Processing Documents
7	Debug
8	Design Notes
9	Acknowledgment
	Bibliography