On this page:
13.1 Layering
13.2 FFmpeg version information
ffmpeg-version
13.3 Implementation strategy
13.3.1 C structs and offsets
13.3.2 Defensive control flow
13.4 Decoder instances
fmpg-init
fmpg-open-file!
fmpg-close!
fmpg-is-open
13.5 Audio stream information
fmpg-audio-stream-count
fmpg-audio-sample-rate
fmpg-audio-channels
fmpg-audio-bits-per-sample
fmpg-audio-bytes-per-sample
fmpg-duration-ms
fmpg-duration-samples
fmpg-file-bitrate
13.6 Output format
13.7 Decoding
fmpg-decode-next!
13.8 Decoded buffers
fmpg-buffer
fmpg-buffer-size
fmpg-buffer-start-sample
fmpg-buffer-end-sample
fmpg-sample-position
13.9 Seeking
fmpg-seek-ms!
13.10 Resource ownership
13.11 Use through the decoder frontend
13.12 Examples
9.2.0.5

13 FFmpeg Decoder Definitions🔗ℹ

Hans Dijkema <hans@dijkewijk.nl>

 (require racket-audio/ffmpeg-definitions)
  package: racket-audio

This module provides the direct FFmpeg-backed decoder layer used by the audio pipeline. It is deliberately small and stateful. A caller creates one decoder instance, opens one file on it, queries the selected audio stream, repeatedly asks for the next PCM block, and closes the instance again.

The module does not expose FFmpeg metadata. It only exposes the information needed for playback: stream count, sample rate, channel count, duration, bitrate, decoded PCM data, and sample positions. The output format is fixed: interleaved signed 32-bit PCM, four bytes per sample, using FFmpeg’s AV_SAMPLE_FMT_S32 sample format.

The FFmpeg libraries are loaded when the module is required. The module checks that the runtime FFmpeg major versions are in the supported range configured by the implementation. This binding targets the FFmpeg library major versions used by FFmpeg 6, 7, and 8: libavutil 58 to 60, libavcodec 60 to 62, libavformat 60 to 62, and libswresample 4 to 6. Unsupported runtime versions fail early, before a decoder instance is used.

On Windows, the private library loader may download the bundled sound-library set into Racket’s add-on directory before the FFI libraries are opened. On Unix-like systems, the FFmpeg libraries are expected to be installed by the operating system or platform package manager and to be reachable by Racket’s FFI library search path.

13.1 Layering🔗ℹ

This module is the low-level Racket FFI layer. It is normally wrapped by "ffmpeg-ffi.rkt" and then by "ffmpeg-decoder.rkt". The first wrapper adapts this module to the command protocol used by the audio decoder frontend. The second wrapper exposes the callback-oriented decoder interface used by the rest of the playback pipeline.

The distinction matters for buffer lifetime. At this level, fmpg-buffer returns the current buffer owned by the decoder instance. The adapter in "ffmpeg-ffi.rkt" copies that buffer before passing it to "ffmpeg-decoder.rkt". Code that uses this module directly must copy the buffer itself when the bytes must survive the next decoder operation.

13.2 FFmpeg version information🔗ℹ

procedure

(ffmpeg-version lib)  
(list/c exact-nonnegative-integer?
        exact-nonnegative-integer?
        exact-nonnegative-integer?)
  lib : 
(or/c 'avutil 'avcodec 'avformat
      'swr 'swresample)
Returns the runtime version of one FFmpeg library as a three-element list containing the major, minor, and micro version numbers. The symbols 'swr and 'swresample both refer to libswresample.

The version is read from FFmpeg’s packed integer value. For example, a runtime value corresponding to 62.28.100 is returned as '(62 28 100). The function raises an exception for an unknown library symbol.

The runtime versions determine which partial FFmpeg struct layouts are safe to use. If a future FFmpeg major release changes a layout before one of the fields read by this module, the supported range should be extended only after the affected partial definitions have been checked.

13.3 Implementation strategy🔗ℹ

This module talks directly to the FFmpeg shared libraries through Racket’s FFI. There is no C shim that hides FFmpeg’s structs or normalizes their layout. The price of that choice is that the Racket side must know enough of the relevant C struct layouts to read the fields used by the decoder. The benefit is that the binding remains a Racket module with direct access to the platform FFmpeg libraries.

13.3.1 C structs and offsets🔗ℹ

Small and stable structures, such as AVRational and AVChannelLayout, are described with define-cstruct. A define-cstruct form describes the C fields to Racket’s FFI. Racket then calculates the correct field offsets for the current platform ABI and creates the corresponding pointer type, constructor, accessors and mutators.

The larger FFmpeg structures are handled by def-cstruct from "private/cstruct-helper.rkt". Structures such as AVCodecParameters, AVStream, AVFormatContext, AVFrame and AVPacket are large and may differ between FFmpeg major versions. The decoder only needs a few fields from each one, but those fields must still be read from their exact native offsets.

The helper solves this by describing the complete field sequence up to the last field the backend needs. Unnamed entries are used only to advance the offset. Named entries become generated accessors. Repeated entries such as (6 _int) keep the definition compact while still allowing Racket’s FFI to compute alignment, padding and pointer size correctly. Tail fields after the last required member are not described.

The right layout is selected when the module is required, after the runtime FFmpeg major versions have been read from the libraries. For the supported range, _AVCodecParameters uses one layout for libavcodec major version 60 and another for major versions 61 and 62. Likewise, _AVFrame uses one layout for libavutil major version 58 and another for major versions 59 and 60. The other partial structs used by this module are defined with a single layout across the supported versions.

13.3.2 Defensive control flow🔗ℹ

Most FFmpeg calls report ordinary failure through C-style return values or null pointers. The implementation treats those results as normal control flow. The let/assert form is used for setup paths where each native result must be checked before the next native call is made. It behaves like a sequential binding form: each binding can be checked immediately, and a failed check returns the specified failure value for the whole form.

That style is used for opening a file, selecting stream information, allocating the codec context, and initializing the resampler. Predicates such as a-!nullptr?, a-nullptr?, a-true?, and a->=? express the usual FFmpeg checks directly next to the binding that produced the value.

The decode and seek paths also use early-return where processing must stop immediately from a nested position. This keeps the normal FFmpeg outcomes away from exception-based control flow while still making cleanup actions local to the point where a failure can occur.

13.4 Decoder instances🔗ℹ

A decoder instance is an opaque value returned by fmpg-init. Its structure type and predicate are not exported. Pass the value back to the functions in this module and do not inspect it directly. The contracts below therefore use any/c for the instance argument. Operationally, that argument must be a value returned by fmpg-init.

The instance owns native FFmpeg resources: a format context, a codec context, an audio frame, a resampler, and the Racket byte string used for the current PCM block. Finalizers are installed as a last line of defence, but callers should still call fmpg-close! explicitly when playback stops or when the file is no longer needed. Explicit close keeps the lifetime of native resources predictable.

procedure

(fmpg-init)  any/c

Creates a new decoder instance. The result is an opaque instance value, or #f if the instance could not be created.

Creating the instance does not open a file. Use fmpg-open-file! before querying stream information or decoding audio.

procedure

(fmpg-open-file! instance filename)  (integer-in 0 1)

  instance : any/c
  filename : (or/c path? string?)
Opens filename on instance, reads the stream information, selects the best audio stream, initializes the codec context, and initializes the resampler.

The function returns 1 on success and 0 on failure. On failure, partially initialized native state is closed again. A non-string, non-path filename is treated as an open failure and returns 0.

An instance can only have one file open. Close it with fmpg-close! before opening another file on the same instance.

procedure

(fmpg-close! instance)  void?

  instance : any/c
Closes instance if it is open and releases the native FFmpeg resources owned by the instance. The codec context, frame and resampler are freed before the format context is closed. This order avoids keeping decoder pointers that refer to streams from an already closed container.

The stored audio information is reset. Calling this function with #f or with an already closed instance is harmless.

procedure

(fmpg-is-open instance)  (integer-in 0 1)

  instance : any/c
Returns 1 when instance is ready for decoding and 0 otherwise. An instance is ready only after a file has been opened, a usable audio stream has been selected, and the decoder and resampler have been initialized.

13.5 Audio stream information🔗ℹ

The decoder selects one audio stream for playback using FFmpeg’s best-stream selection. The stream count reports how many audio streams were found in the container, but decoding is performed only for the selected stream.

The term sample in this module means a sample frame: one time step in the audio stream, across all channels. For stereo 32-bit output, one sample frame therefore occupies (* 2 4) bytes in the returned PCM buffer.

procedure

(fmpg-audio-stream-count instance)  exact-nonnegative-integer?

  instance : any/c
Returns the number of audio streams in the open container. If the instance is not open, the result is 0. This count is informational; actual stream selection is performed during fmpg-open-file!.

procedure

(fmpg-audio-sample-rate instance)  exact-nonnegative-integer?

  instance : any/c

procedure

(fmpg-audio-channels instance)  exact-nonnegative-integer?

  instance : any/c
Return the sample rate and channel count of the selected audio stream. If the instance is not ready, both functions return 0.

procedure

(fmpg-audio-bits-per-sample instance)  exact-positive-integer?

  instance : any/c

procedure

(fmpg-audio-bytes-per-sample instance)

  exact-positive-integer?
  instance : any/c
Return the fixed output sample width in bits and bytes. The current output format is 32-bit signed PCM, so fmpg-audio-bits-per-sample returns 32 and fmpg-audio-bytes-per-sample returns 4. The values are independent of the input file’s original sample format and do not depend on the instance state.

procedure

(fmpg-duration-ms instance)  exact-integer?

  instance : any/c

procedure

(fmpg-duration-samples instance)  exact-integer?

  instance : any/c
Return the duration of the selected audio stream in milliseconds and in sample frames. If the stream duration is not available, the container duration is used as a fallback. If no duration can be determined, or when the instance is not ready, the result is -1.

procedure

(fmpg-file-bitrate instance)  exact-integer?

  instance : any/c
Returns the container bitrate in bits per second. If the bitrate is unavailable or if the instance is not open, the result is -1. Only positive FFmpeg bitrates are passed through as reliable.

13.6 Output format🔗ℹ

The decoder output format is intentionally fixed:

  • sample format: signed 32-bit PCM, AV_SAMPLE_FMT_S32

  • layout: interleaved

  • sample rate: the selected stream’s sample rate

  • channels: the selected stream’s channel count

This keeps the playback layer simple. The FFmpeg input format may be planar, floating point, compressed, or otherwise different; libswresample converts the decoded frames to the fixed output format before the bytes are exposed to Racket.

13.7 Decoding🔗ℹ

Decoding is block oriented. Each call to fmpg-decode-next! clears the previous PCM block and attempts to produce the next decoded block for the selected audio stream. When the call returns 1, the block can be read with fmpg-buffer and described with the buffer query functions.

procedure

(fmpg-decode-next! instance)  exact-integer?

  instance : any/c
Decodes until a block of PCM output is available, end of stream is reached, or an error occurs. The return values are:

  • 1: a new PCM buffer is available through fmpg-buffer.

  • 0: decoding is complete and no more PCM is available.

  • A negative value: decoding failed or the instance was not ready.

Internally, the decoder first tries to receive frames that FFmpeg may already have buffered. If no frame is ready, it reads packets until it finds a packet for the selected audio stream. Packets from other streams are skipped and immediately unreferenced. Sent packets are unreferenced after avcodec_send_packet, because the codec has then taken what it needs.

At end of input, the function drains both the codec and the resampler. This is necessary because FFmpeg and libswresample may still hold delayed samples even after the demuxer has no more packets.

13.8 Decoded buffers🔗ℹ

The PCM buffer belongs to the decoder instance. It is replaced by the next call to fmpg-decode-next!, fmpg-seek-ms!, or fmpg-close!. Treat the returned byte string as read-only. Copy it if it must outlive the next decoder operation or if another component may mutate it.

procedure

(fmpg-buffer instance)  (or/c bytes? #f)

  instance : any/c
Returns the current decoded PCM block as a byte string, or #f when no PCM block is available.

The byte string contains interleaved signed 32-bit samples. Its logical frame count is available as the difference between fmpg-buffer-end-sample and fmpg-buffer-start-sample. Its byte size is also available through fmpg-buffer-size.

procedure

(fmpg-buffer-size instance)  exact-nonnegative-integer?

  instance : any/c
Returns the number of valid bytes in the current PCM buffer. If no decoder state is available, or if the size would not fit in the internal integer range, the function returns 0.

procedure

(fmpg-buffer-start-sample instance)

  exact-nonnegative-integer?
  instance : any/c

procedure

(fmpg-buffer-end-sample instance)  exact-nonnegative-integer?

  instance : any/c

procedure

(fmpg-sample-position instance)  exact-nonnegative-integer?

  instance : any/c
Return sample-frame positions for the current decoder state.

fmpg-buffer-start-sample returns the first sample frame represented by the current PCM buffer. fmpg-buffer-end-sample returns the half-open end position: the first sample frame after the current buffer. fmpg-sample-position returns the next sample position the decoder expects to produce.

These values count sample frames, not individual channel samples. For stereo audio, one sample frame contains one sample for the left channel and one sample for the right channel.

13.9 Seeking🔗ℹ

procedure

(fmpg-seek-ms! instance target-pos-ms)  (integer-in 0 1)

  instance : any/c
  target-pos-ms : exact-nonnegative-integer?
Seeks the selected audio stream to target-pos-ms milliseconds and resets the decoder and resampler state. The function returns 1 on success and 0 on failure. Seeking is allowed only when the instance is already ready for decoding and the target position is non-negative.

Seeking uses FFmpeg’s backward seek flag. FFmpeg may therefore seek to a packet position before the requested target. The decoder stores a discard target in sample frames. During the following decode calls, frames before the target are dropped, and frames that overlap the target are trimmed so the exposed PCM buffer starts at, or as close as FFmpeg can provide to, the requested position.

After a successful seek, the codec buffers are flushed, the resampler is closed and reinitialized, EOF state is cleared, and sample bookkeeping is reset to the target position.

13.10 Resource ownership🔗ℹ

The decoder instance owns the native FFmpeg objects it allocates. The codec pointer returned by FFmpeg is not owned by the instance, but the codec context, frame, resampler and format context are. They are released by fmpg-close!. Finalizers are registered as a safety net, but callers should close decoder instances explicitly.

Temporary native buffers used during resampling are allocated only for the duration of a conversion step and are always freed before control returns to the caller. The public PCM buffer is a Racket byte string, so it can safely be passed to the Racket-side playback backend.

13.11 Use through the decoder frontend🔗ℹ

The direct API above is normally wrapped by "ffmpeg-ffi.rkt" and by "ffmpeg-decoder.rkt". The frontend function ffmpeg-open returns a handle or #f when the file does not exist. Its stream-info callback receives a mutable hash with at least these playback keys:

(list 'sample-rate
      'channels
      'bits-per-sample
      'bytes-per-sample
      'total-samples
      'duration)

The audio callback receives the same hash extended for the current buffer with these keys:

(list 'sample
      'current-time)

The hash is followed by a copied byte string and its valid byte count. The copy is made by "ffmpeg-ffi.rkt", not by the low-level buffer function itself.

The frontend’s seek function accepts a percentage of the stream and translates that percentage to a sample position. The adapter then translates the sample position to milliseconds and calls fmpg-seek-ms!. This is why the low-level module exposes millisecond seeking while the frontend exposes percentage seeking.

13.12 Examples🔗ℹ

The following example opens a file, decodes all PCM blocks, and reports their byte ranges and sample ranges. A real playback loop would pass each buffer to the audio output layer before requesting the next block.

(define dec (fmpg-init))
 
(when (and dec (= (fmpg-open-file! dec "track.ogg") 1))
  (printf "~a Hz, ~a channels, ~a ms\n"
          (fmpg-audio-sample-rate dec)
          (fmpg-audio-channels dec)
          (fmpg-duration-ms dec))
 
  (let loop ()
    (case (fmpg-decode-next! dec)
      [(1)
       (define pcm (fmpg-buffer dec))
       (define size (fmpg-buffer-size dec))
       (define start (fmpg-buffer-start-sample dec))
       (define end (fmpg-buffer-end-sample dec))
       (printf "decoded ~a bytes, samples [~a, ~a)\n"
               size start end)
 
       (loop)]
      [(0)
       (printf "done\n")]
      [else
       (error "decode error")]))
 
  (fmpg-close! dec))

A simple seek flow looks the same after the seek succeeds. The following code moves to 30 seconds and then requests the next decoded buffer.

(when (= (fmpg-seek-ms! dec 30000) 1)
  (when (= (fmpg-decode-next! dec) 1)
    (define pcm (fmpg-buffer dec))
    (define start (fmpg-buffer-start-sample dec))
    (printf "first buffer after seek starts at sample ~a\n" start)))