HTML5 Printer
xexpr->html5
1 Crunchy details
1.1 HTML particulars
1.2 Comparing with included Racket functions
1.3 Comparing with xexpr->html
1.4 Comparing with HTML Tidy
1.5 Probing and prodding
proof
debug
8.14.0.1

HTML5 Printer🔗ℹ

Joel Dueck

 (require html-printer) package: html-printer

If you use Racket to generate web pages, you should use this package to make your HTML both readable and correct. Why go to the trouble of designing clean, semantic markup if you’re just going to slap it all on one line that scrolls horizontally forever?

This package provides a single function — xexpr->html5 — for converting X-expressions to strings of HTML. Unlike other functions which can be used for this purpose, this one is focused on a tidy presentation of HTML5 content specifically. It indents and wraps lines, allowing you to set the width of the output, while ensuring line breaks aren’t placed where they would create syntactic difference from the input. It favors HTML5 syntax over XML syntax in some cases.

This package is also Unicode-aware, measuring line length in graphemes rather than characters. So, for example the emoji 🧝‍♂️ — which actually consists of four Unicode “characters” — is counted as having length 1 rather than 4.

If you encounter a bug, please open an issue on the GitHub repo.

Requires Racket 8.13 or later due to internal use of racket/mutable-treelist.

procedure

(xexpr->html5 xpr    
  [#:wrap wrap-col    
  #:add-breaks? add-breaks?])  string?
  xpr : xexpr?
  wrap-col : exact-positive-integer? = 100
  add-breaks? : any/c = #f
Converts xpr to a string of HTML, nicely wrapped and indented, ready for consumption. Leave wrap-col at its default of 100 columns, or shrink it down hard to test the line-wrapping algorithm.

Example:
> (display
   (xexpr->html5 #:wrap 25
                 '(body
                   (article
                    (h1 "My Title")
                    (p "Welcome to the blog!"))
                   (footer
                    (div (p "Right here in River City"))))))

<body>

  <article>

    <h1>My Title</h1>

    <p>Welcome to the

    blog!</p>

  </article>

  <footer>

    <div>

      <p>Right here in

      River City</p>

    </div>

  </footer>

</body>

If xpr begins with 'html, then the HTML5 doctype is prepended to the document:

Example:
> (display
   (xexpr->html5 '(html (head (meta [[charset "UTF-8"]])))))

<!DOCTYPE html>

<html>

  <head>

    <meta charset="UTF-8">

  </head>

</html>

If add-breaks? is not #f, additional line breaks will be added between closing block/flow tags (except for <meta>, <link> and <title>) and any opening tags:

> (display (xexpr->html5
            #:add-breaks? #t
            '(body
              (article
               (h1 "My Title")
               (p "Welcome to the blog!"))
              (footer
               (div (p "Right here in River City"))))))

<body>

  <article>

    <h1>My Title</h1>

 

    <p>Welcome to the blog!</p>

  </article>

 

  <footer>

    <div>

      <p>Right here in River City</p>

    </div>

  </footer>

</body>

That’s all there is to it, really. But if you want more info, check out the Crunchy details.

1 Crunchy details🔗ℹ

This package includes an extensive set of unit tests. In addition to preventing regressions, these nicely illustrate the printer’s expected behavior in a variety of edge cases.

1.1 HTML particulars🔗ℹ

Escaping special characters: Any <, > and & characters in string elements are escaped, and any symbols or integers in element position are converted to character entities:

> (display (xexpr->html5 '(p "Entities: " nbsp 65)))

<p>Entities: &nbsp;&#65;</p>

> (display (xexpr->html5 '(p "Escaping < > &")))

<p>Escaping &lt; &gt; &amp;</p>

In attribute values, the " character is escaped in addition to <, > and & characters:

> (display (xexpr->html5 '(p [[data-desc "Escaping \" < > &"]] "Foo")))

<p data-desc="Escaping &quot; &lt; &gt; &amp;">Foo</p>

The contents of <style> and <script> tags are never escaped or wrapped; the contents of <pre> tags are escaped, but never wrapped.

> (display
   (xexpr->html5 '(body (style "/* No escaping! & < > \" */")
                        (script "/* No escaping! & < > \" */")
                        (pre "Escaping! & < > \""))))

<body>

  <style>/* No escaping! & < > " */</style>

  <script>/* No escaping! & < > " */</script>

  <pre>Escaping! &amp; &lt; &gt; "</pre>

</body>

The printer can handle XML comment and cdata elements. Comments are line-wrapped and indented like everything else. CDATA content is never modified or escaped.

> (define com (comment "Behold, a hidden comment & < >"))
> (define cd (cdata #f #f "<![CDATA[Also some of this & < > ]]>"))
> (display
   (xexpr->html5 #:wrap 20 `(body (article (h1 "Title" ,com) (p ,cd " foo")))))

<body>

  <article>

    <h1>Title<!--

    Behold, a hidden

    comment & < >

    --></h1>

    <p>

    <![CDATA[Also some of this & < > ]]>

    foo</p>

  </article>

</body>

Differences from XML/XHTML: Attributes which the HTML5 spec identifies as boolean attributes are printed using the HTML5 “short” syntax. So, for example when '(disabled "true") is supplied as an attribute, it is printed as disabled rather than disabled="" or disabled="disabled".

> (display
   (xexpr->html5 '(label (input [[type "checkbox"] [disabled ""]]) " Cheese")))

<label><input type="checkbox" disabled> Cheese</label>

HTML elements which cannot have content (void elements) are ended with > (rather than with /> as in XML):

> (display (xexpr->html5 '(div (img [[src "cat.webp"]]))))

<div>

  <img src="cat.webp">

</div>

> (display (xexpr->html5 '(head (meta [[charset "UTF-8"]]))))

<head>

  <meta charset="UTF-8">

</head>

1.2 Comparing with included Racket functions🔗ℹ

Racket already includes a few functions for printing X-expressions in string form. These work just fine for generic XML markup; but for use as HTML content, the markup they generate can be incorrect or suboptimal.

In particular, all three of these functions will escape <, > and & characters inside <script> and <style> tags, which is likely to introduce JavaScript and CSS errors.

The xexpr->string function is the simplest. It does not offer line wrapping or indentation:

> (xexpr->string '(body (main (script "3 > 2"))))

"<body><main><script>3 &gt; 2</script></main></body>"

The display-xml/content function (in combination with xexpr->xml) offers options for indentation, but the docs warn that in HTML applications additional whitespace may be introduced. It does not support wrapping lines beyond a maximum width.

; Will render incorrectly as "Hello World"
; due to the added line break
> (display-xml/content
   (xexpr->xml '(body (article (p (b "Hello") (i "World")))))
   #:indentation 'scan)

<body>

  <article>

    <p>

      <b>Hello</b>

      <i>World</i>

    </p>

  </article>

</body>

; HTML5 printer will leave lines long
; rather than add significant whitespace
> (display
   (xexpr->html5 #:wrap 20
                 '(body (article (p (b "Hello") (i "World"))))))

<body>

  <article>

    <p>

    <b>Hello</b><i>World</i></p>

  </article>

</body>

The write-xexpr function has the same shortcomings as those already mentioned, and comes with its own very odd optional line wrapping scheme: adding a line break before the closing > of every opening tag.

> (write-xexpr '(body (article (p (b "Hello") (i "World")))))

<body><article><p><b>Hello</b><i>World</i></p></article></body>

> (write-xexpr '(body (article (p (b "Hello") (i "World"))))
               #:insert-newlines? #t)

<body

><article

><p

><b

>Hello</b><i

>World</i></p></article></body>

1.3 Comparing with xexpr->html🔗ℹ

The txexpr package includes xexpr->html, which correctly avoids escaping special characters inside <script> and <style> tags. Its HTML output will always be correct and faithful to the input, but since it performs no wrapping or indentation, the output can be difficult to read without additional processing.

> (define xp '(html
               (head
                (style "/* < > & */"))
               (body
                (section (h1 "Beginning"))
                (section (h1 "End")))))
> (display (xexpr->html xp))

<html><head><style>/* < > & */</style></head><body><section><h1>Beginning</h1></section><section><h1>End</h1></section></body></html>

> (display (xexpr->html5 xp))

<!DOCTYPE html>

<html>

  <head>

    <style>/* < > & */</style>

  </head>

  <body>

    <section>

      <h1>Beginning</h1>

    </section>

    <section>

      <h1>End</h1>

    </section>

  </body>

</html>

1.4 Comparing with HTML Tidy🔗ℹ

The HTML Tidy console application has been the best available tool for linting, correcting and formatting HTML markup since its creation in 1994. Its original purpose was to correct errors in HTML files written by hand in text editors.

Tidy is a much more comprehensive tool than this one and much more configurable. It always produces correctly line-wrapped and indented HTML, though this is only part of its functionality.

In terms of formatting functionality specifically, there are only a couple of significant difference betweeen Tidy and this package:

Note that MacOS ships with an old version of HTML Tidy, but it’s too old for use with modern HTML.

This package includes unit tests which compare its output against that of HTML Tidy in some cases. When tests are run (including at the time of package installation), it will search for a version of Tidy version 5.8.0 or newer, first in the HTML_TIDY_PATH environment variable, then in the current PATH; if found, these unit tests will be run normally. Otherwise, the tests will pass without any comparison actually being made.

1.5 Probing and prodding🔗ℹ

 (require html-printer/debug) package: html-printer

I lied at the beginning of these docs when I said this package only provides a single function. Here are a couple more, though they will only be interesting to people who really want to kick the tires.

procedure

(proof x #:wrap wrap)  void?

  x : xexpr?
  wrap : 20

procedure

(debug x #:wrap wrap)  void?

  x : xexpr?
  wrap : 20
Used for a close visual inspection of line wrapping and indentation, proof displays the result of (xexpr->html5 x #:wrap wrap) but with a column rule at the top and whitespace characters made visible:

> (proof '(p "Chaucer, Rabelais and " (em "Balzac!")))

----|----1----|----2----|----3----|

<p>Chaucer,·Rabelais¶

and·<em>Balzac!</em></p>¶

The debug function does the same thing but spits out an ungodly amount of gross logging on (current-error-port), for use in debugging the printing algorithm. (Note that all logging activity is disabled by default because of its huge performance penalty, but it gets temporarily enabled during calls to debug by way of parameterize.)

> (debug '(p "Chaucer, Rabelais and " (em "Balzac!")))

----|----1----|----2----|----3----|

<p>Chaucer,·Rabelais¶

and·<em>Balzac!</em></p>¶

html-printer: EXPR block starting… • tag:p prev-token:first

html-printer:    └─ PRT indent start • col:1 indent-level:0

html-printer:    └─ PRT indent end • col:1 indent-level:0

html-printer:    └─ PRT put! start… • v:<p> col:1 accum-width:0 logical-line-start:#t indent-level:0

html-printer:    └─ PRT put! …end • col:4 accum-width:0 logical-line-start:#f indent-level:0

html-printer: EXPR string starting… • prev-token:normal str:Chaucer, Rabelais and  

html-printer:    └─ PRT accum/wrap! start… • col:4 accum-width:0 logical-line-start:#f indent-level:0 accumulator:{}

html-printer:       └─ PRT accum! start… • col:4 accum-width:0 logical-line-start:#f indent-level:0 breakpoint-before?:#t accumulator:{}

html-printer:       └─ PRT accum! …end • col:4 accum-width:8 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,"}

html-printer:    └─ PRT accum/wrap! …end • col:4 accum-width:8 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,"}

html-printer:    └─ PRT accum/wrap! start… • col:4 accum-width:8 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,"}

html-printer:       └─ PRT accum! start… • col:4 accum-width:8 logical-line-start:#f indent-level:0 breakpoint-before?:#t accumulator:{#<_bp>,"Chaucer,"}

html-printer:       └─ PRT accum! …end • col:4 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,",#<_bp>," "}

html-printer:    └─ PRT accum/wrap! …end • col:4 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,",#<_bp>," "}

html-printer:    └─ PRT accum/wrap! start… • col:4 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,",#<_bp>," "}

html-printer:       └─ PRT flush start… • col:4 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,",#<_bp>," "}

html-printer:          └─ PRT lop-accum-end _ • accum-width:9 lopped-len:1 which-end:right

html-printer:          └─ PRT flush at_bp • col:4 buffer-width:0 held-whsp?:#f buffer:{}

html-printer:          └─ PRT flush non-whsp • held-whsp?:#f buffer:{"Chaucer,"}

html-printer:          └─ PRT flush at_bp • col:4 buffer-width:8 held-whsp?:#f buffer:{"Chaucer,"}

html-printer:             └─ PRT flush printbuf… • held-whsp?:#f logical-line-start:#f indent-level:0

html-printer:                └─ PRT put! start… • v:Chaucer, col:4 accum-width:8 logical-line-start:#f indent-level:0

html-printer:                └─ PRT put! …end • col:12 accum-width:8 logical-line-start:#f indent-level:0

html-printer:       └─ PRT flush done-breaking • col:12 accum-width:8 logical-line-start:#f accumulator:{#<_bp>,"Chaucer,",#<_bp>}

html-printer:    └─ PRT accum! start… • col:12 accum-width:0 logical-line-start:#f indent-level:0 breakpoint-before?:#f accumulator:{}

html-printer:    └─ PRT accum! …end • col:12 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{" "}

html-printer:       └─ PRT flush done • col:12 accum-width:1 logical-line-start:#f accumulator:{" "}

html-printer:       └─ PRT accum! start… • col:12 accum-width:1 logical-line-start:#f indent-level:0 breakpoint-before?:#t accumulator:{" "}

html-printer:       └─ PRT accum! …end • col:12 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{" ",#<_bp>,"Rabelais"}

html-printer:    └─ PRT accum/wrap! …end • col:12 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{" ",#<_bp>,"Rabelais"}

html-printer:    └─ PRT accum/wrap! start… • col:12 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{" ",#<_bp>,"Rabelais"}

html-printer:       └─ PRT flush start… • col:12 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{" ",#<_bp>,"Rabelais"}

html-printer:          └─ PRT flush whitespace • v:  

html-printer:          └─ PRT flush at_bp • col:12 buffer-width:1 held-whsp?:1 buffer:{}

html-printer:          └─ PRT flush non-whsp • held-whsp?:1 buffer:{" ","Rabelais"}

html-printer:       └─ PRT flush done-breaking • col:12 accum-width:9 logical-line-start:#f accumulator:{" ",#<_bp>,"Rabelais"}

html-printer:          └─ PRT put! start… • v:  col:12 accum-width:9 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! …end • col:13 accum-width:9 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! start… • v:Rabelais col:13 accum-width:9 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! …end • col:21 accum-width:9 logical-line-start:#f indent-level:0

html-printer:       └─ PRT accum! start… • col:21 accum-width:0 logical-line-start:#f indent-level:0 breakpoint-before?:#t accumulator:{}

html-printer:       └─ PRT accum! …end • col:21 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{#<_bp>," "}

html-printer:    └─ PRT accum/wrap! …end • col:21 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{#<_bp>," "}

html-printer:    └─ PRT accum/wrap! start… • col:21 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{#<_bp>," "}

html-printer:       └─ PRT flush start… • col:21 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{#<_bp>," "}

html-printer:          └─ PRT lop-accum-end _ • accum-width:1 lopped-len:1 which-end:right

html-printer:          └─ PRT flush at_bp • col:21 buffer-width:0 held-whsp?:#f buffer:{}

html-printer:          └─ PRT break+indent! start… • col:21 accum-width:0 logical-line-start:#f indent-level:0 accumulator:{#<_bp>}

html-printer:          └─ PRT break+indent! …end • col:1 accum-width:0 logical-line-start:#t indent-level:0 accumulator:{#<_bp>}

html-printer:       └─ PRT flush done-breaking • col:1 accum-width:0 logical-line-start:#t accumulator:{#<_bp>}

html-printer:       └─ PRT accum! start… • col:1 accum-width:0 logical-line-start:#t indent-level:0 breakpoint-before?:#t accumulator:{}

html-printer:       └─ PRT accum! …end • col:1 accum-width:3 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and"}

html-printer:    └─ PRT accum/wrap! …end • col:1 accum-width:3 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and"}

html-printer:    └─ PRT accum/wrap! start… • col:1 accum-width:3 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and"}

html-printer:       └─ PRT accum! start… • col:1 accum-width:3 logical-line-start:#t indent-level:0 breakpoint-before?:#t accumulator:{#<_bp>,"and"}

html-printer:       └─ PRT accum! …end • col:1 accum-width:4 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," "}

html-printer:    └─ PRT accum/wrap! …end • col:1 accum-width:4 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," "}

html-printer: EXPR string end • last-word:  

html-printer: EXPR inline start… • tag:em prev-token:normal

html-printer:    └─ PRT accum/wrap! start… • col:1 accum-width:4 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," "}

html-printer:       └─ PRT accum! start… • col:1 accum-width:4 logical-line-start:#t indent-level:0 breakpoint-before?:#t accumulator:{#<_bp>,"and",#<_bp>," "}

html-printer:       └─ PRT accum! …end • col:1 accum-width:8 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>"}

html-printer:    └─ PRT accum/wrap! …end • col:1 accum-width:8 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>"}

html-printer: EXPR string starting… • prev-token:sticky str:Balzac!

html-printer:    └─ PRT accum! start… • col:1 accum-width:8 logical-line-start:#t indent-level:0 breakpoint-before?:#f accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>"}

html-printer:    └─ PRT accum! …end • col:1 accum-width:15 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!"}

html-printer: EXPR string end • last-word:Balzac!

html-printer:    └─ PRT pop-whitespace _ • accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!"}

html-printer: EXPR inline after • popped:#f tag:em

html-printer: EXPR inline …closing • tag:em last-token:sticky

html-printer:    └─ PRT accum! start… • col:1 accum-width:15 logical-line-start:#t indent-level:0 breakpoint-before?:#f accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!"}

html-printer:    └─ PRT accum! …end • col:1 accum-width:20 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!","</em>"}

html-printer: EXPR block …closing • tag:p last-tok:sticky

html-printer:    └─ PRT check/flush col • accum-width:20 wrap-col:20 indent-level:0

html-printer:       └─ PRT flush start… • col:1 accum-width:20 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!","</em>"}

html-printer:          └─ PRT flush at_bp • col:1 buffer-width:0 held-whsp?:#f buffer:{}

html-printer:          └─ PRT flush non-whsp • held-whsp?:#f buffer:{"and"}

html-printer:          └─ PRT flush at_bp • col:1 buffer-width:3 held-whsp?:#f buffer:{"and"}

html-printer:             └─ PRT flush printbuf… • held-whsp?:#f logical-line-start:#t indent-level:0

html-printer:                └─ PRT put! start… • v:and col:1 accum-width:20 logical-line-start:#t indent-level:0

html-printer:                └─ PRT put! …end • col:4 accum-width:20 logical-line-start:#f indent-level:0

html-printer:          └─ PRT flush whitespace • v:  

html-printer:          └─ PRT flush at_bp • col:4 buffer-width:1 held-whsp?:1 buffer:{}

html-printer:          └─ PRT flush non-whsp • held-whsp?:1 buffer:{" ","<em>"}

html-printer:          └─ PRT flush non-whsp • held-whsp?:#f buffer:{" ","<em>","Balzac!"}

html-printer:          └─ PRT flush non-whsp • held-whsp?:#f buffer:{" ","<em>","Balzac!","</em>"}

html-printer:       └─ PRT flush done-breaking • col:4 accum-width:20 logical-line-start:#f accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!","</em>"}

html-printer:          └─ PRT put! start… • v:  col:4 accum-width:20 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! …end • col:5 accum-width:20 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! start… • v:<em> col:5 accum-width:20 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! …end • col:9 accum-width:20 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! start… • v:Balzac! col:9 accum-width:20 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! …end • col:16 accum-width:20 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! start… • v:</em> col:16 accum-width:20 logical-line-start:#f indent-level:0

html-printer:          └─ PRT put! …end • col:21 accum-width:20 logical-line-start:#f indent-level:0

html-printer:    └─ PRT put! start… • v:</p> col:21 accum-width:0 logical-line-start:#f indent-level:0

html-printer:    └─ PRT put! …end • col:25 accum-width:0 logical-line-start:#f indent-level:0

html-printer:    └─ PRT break! start… • col:25 accum-width:0 logical-line-start:#f indent-level:0 accumulator:{}

html-printer:    └─ PRT break! …end • col:1 accum-width:0 logical-line-start:#t indent-level:0 accumulator:{}