8.16.0.4
Extracting binary data from bytestrings using match🔗ℹ
This module introduces a new match pattern for matching and destructuring binary data encoded in a bytestring.
The API should be considered very alpha and open to incompatible changes.
Some similar packages include xenomorph and the "#lang" based
binfmt.
1 The binary match pattern🔗ℹ
(binary byte-pattern ...+ maybe-rest)
|
|
byte-pattern | | = | | (bytes pat length) | | | | | | (zero-padded pat length) | | | | | | (until-byte pat byte) | | | | | | (until-byte* pat byte) | | | | | | (length-prefixed pat) | | | | | | (length-prefixed pat prefix-length endianness) | | | | | | (number-type pat) | | | | | | (number-type pat endianness) | | | | | | control-pattern | | | | | | maybe-rest | | = | | | | | | | | (rest* pat) | | | | | | control-pattern | | = | | (get-offset pat) | | | | | | (set-offset! offset) | | | | | | number-type | | = | | s8 | | | | | | u8 | | | | | | s16 | | | | | | u16 | | | | | | s32 | | | | | | u32 | | | | | | u64 | | | | | | s64 | | | | | | f32 | | | | | | f64 | | | | | | prefix-length | | = | | u8 | | | | | | u16 | | | | | | u32 | | | | | | u64 | | | | | | endianness | | = | | big-endian | | | | | | little-endian | | | | | | native-endian | | | | | | host-order | | | | | | network-order |
|
|
|
A match extender that, when matched against a bytestring, tries to destructure it according to the given spec and match extracted values against given match patterns.
An example:
(match #"\17\240bc" |
((binary (s16 num big-endian) (bytes rest 2)) |
(list num rest))) ; (4000 #"bc") |
bytes extracts a fixed-width field. zero-padded extracts a fixed-width field and strips trailing 0 bytes. until-byte extracts bytes until the given
delimiter byte is encountered. until-byte* is the same but a failure to find the delimiter is not a match failure. length-prefixed reads a length header and then
that many bytes. It defaults to the 9P protocol specification of a 2 byte little-endian length if not explicitly
specified.
The number patterns should hopefully be self explanatory.
rest* takes any remaining bytes at the end of the bytestring after everything else is matched; if there are no extra bytes, it applies an empty bytestring
to its pattern.
Normally, matching starts with the first byte in the bytestring. (set-offset! where) changes the location (To facilitate matching bytestrings with multiple records),
and get-offset will save the current index at that point in the matching.
A more complex example, that matches an IPv4 header:
(parameterize ([binary-match-default-endianness 'network-order]) |
(match header |
((binary |
(u8 (app byte->nybbles version header-length)) (u8 service-type) (u16 total-length) |
(u16 identification) (u16 flags+fragment) |
(u8 ttl) (u8 protocol) (u16 checksum) |
(bytes (app make-ip-address source-address) 4) |
(u32 (app (lambda (n) (make-ip-address n 4)) dest-address)) |
(rest* options)) |
(list version header-length service-type total-length ttl protocol |
(ip-address->string source-address) (ip-address->string dest-address) |
options)))) |
2 Additional functions🔗ℹ
Splits a single byte into two 4-bit nybbles. The upper 4 bits is the first value, the lower 4 is the second.
A parameter that controls the endianness used by numeric patterns when one isn’t explicitly given.