binfmt:   binary format parser generator
1 Example
2 Grammar and Operation
3 Reference
exn:  fail:  binfmt?
exn:  fail:  binfmt-id
8.17.0.6

binfmt: binary format parser generator🔗ℹ

Bogdan Popa <bogdan@defn.io>

 #lang binfmt package: binfmt-lib

This package provides a #lang for building binary format parsers with support for limited context-sensitivity.

1 Example🔗ℹ

Here is a parser definition for the ID3v1 format:

#lang binfmt
id3     = magic title artist album year comment genre;
magic   = 'T' 'A' 'G';
title   = u8{30};
artist  = u8{30};
album   = u8{30};
year    = u8{4};
comment = u8{30};
genre   = u8;

Assuming this is saved in a file called "id3v1.b", you can import it from Racket and apply any of the definitions to an input port in order to parse its contents:

> (require "id3v1.b")

You can parse the magic header by itself:

> (magic (open-input-bytes #"TAG"))

'((char_1 . #\T) (char_2 . #\A) (char_3 . #\G))

Or a full tag:

> (define data
   (bytes-append
    #"TAGCreative Commons Song         Improbulus                    N"
    #"/A                           2005Take on O Mio Babbino Caro!   g"))
> (define tree
    (id3 (open-input-bytes data)))

And inspect the resulting parse tree:

> (map car tree)

'(magic_1 title_1 artist_1 album_1 year_1 comment_1 genre_1)

> (define ref (compose1 cdr assq))
> (take (ref 'title_1 tree) 8)

'(67 114 101 97 116 105 118 101)

> (apply bytes (ref 'title_1 tree))

#"Creative Commons Song         "

Finally, parsing invalid data results in a syntax error:

> (id3 (open-input-bytes #"TAG..."))

parse failed

 expected 'u8' but found EOF

  in: string

  position: 7

Every definition automatically creates an un-parser. Un-parsers are functions that take a parse tree as input and serialize the data to an output port. They are named by prepending un- to the name of a definition.

> (define bs
    (call-with-output-bytes
     (lambda (out)
       (un-id3 tree out))))
> (for ([n (in-range 0 (bytes-length bs) 64)])
    (println (subbytes bs n (+ n 64))))

#"TAGCreative Commons Song         Improbulus                    N"

#"/A                           2005Take on O Mio Babbino Caro!   g"

2 Grammar and Operation🔗ℹ

The grammar for binfmt is as follows:

 

def

 ::= 

alt {| alt}* ;

 

alt

 ::= 

expr+

 

expr

 ::= 

term  |  star  |  plus  |  repeat

 

star

 ::= 

term *

 

plus

 ::= 

term +

 

repeat

 ::= 

term { id  |  natural }

 

term

 ::= 

byte

 

  |  

char

 

  |  

id

 

byte

 ::= 

an integer between 0x00 and 0xFF

 

char

 ::= 

' ascii character '

 

id

 ::= 

any identifier

 

natural

 ::= 

any natural number

Within an alt, each expr is assigned a unique name based on its id: the first time an id appers in an alt, _1 is appended to its name, the second time _2, and so on.

Alternatives containing two or more exprs parse to an association list mapping expr names (as defined above) to parse results. Alternatives containing a single expr collapse to the result of the expr.

The repeat syntax can either repeat a parser an exact number of times or it can repeat it based on the result of a previous parser within the same alt. For example, the following parser parses a i8 to determine the length of a string, then parses that number of u8s following it.

#lang binfmt
string = strlen u8{strlen_1};
strlen = i8;

Negative length values are allowed, in which case they’re treated the same as 0. The parser above would parse #"\377" to an empty string.

The following parsers are built-in:

  • TODO

  • u8, u16, u32, u64, u16le, u32le, u64le, u16be, u32be, u64be

  • i8, i16, i32, i64, i16le, i32le, i64le, i16be, i32be, i64be

  • f32, f64, f32le, f64le, f32be, f64be

  • uvarint32, uvarint64

  • varint32, varint64

  • nul, eof

Parsers for alts may backtrack, but backtracking is only supported on file and string input ports. All other types of ports (eg. pipes and custom ports that don’t support setting a file position) cause backtracking to fail with a parsing error.

On parse and unparse failure, an exn:fail:binfmt? error is raised.

3 Reference🔗ℹ

 (require binfmt/runtime) package: binfmt-lib

procedure

(exn:fail:binfmt? v)  boolean?

  v : any/c
Returns #t when v is a binfmt error.

procedure

(exn:fail:binfmt-id e)  symbol?

  e : exn:fail:binfmt?
Returns the id of the parser or unparser that failed.