On this page:
5.1 Pipeline
5.2 Encoding a file
audio-encode
5.3 Result hash
5.4 Progress callback
5.5 Opus settings
5.6 FLAC settings
5.7 Encoder registration
audio-supported-encoder-extensions
make-audio-encoder
audio-encoder?
audio-register-encoder!
9.2.0.5

5 Audio Encoding🔗ℹ

Hans Dijkema <hans@dijkewijk.nl>

 (require racket-audio/audio-encoder) package: racket-audio

The racket-audio/audio-encoder module provides the high level file-to-file encoding pipeline. It reuses the existing decoder environment to read the input file and sends the decoded PCM stream to a selected encoder backend. The built-in backends are Opus, implemented with libopusenc, and FLAC, implemented with libFLAC.

This module is intended as the public encoding API. The concrete backend modules are small FFI backends; applications normally call audio-encode instead of using those modules directly.

5.1 Pipeline🔗ℹ

Encoding is organised as a streaming pipeline:

input file
 
  -> PCM buffers
 
  -> encoder backend
  -> output file

The encoder is selected from #:encoder or, when that argument is not provided, from the output filename extension. The initial built-in encoders are 'opus for ".opus" and ".oga" files, and 'flac for ".flac" files.

The PCM stream is not collected in memory. Each decoded buffer is forwarded to the selected backend. FLAC encoding may insert a PCM conversion step when the settings request a different sample rate, channel count, or bit depth. Opus encoding feeds floating-point PCM to libopusenc; sample-rate conversion for Opus is left to libopusenc.

5.2 Encoding a file🔗ℹ

procedure

(audio-encode input-file    
  output-file    
  settings    
  [#:encoder encoder    
  #:copy-tags? copy-tags?    
  #:progress-callback progress-callback])  hash?
  input-file : path-string?
  output-file : path-string?
  settings : hash?
  encoder : (or/c symbol? #f) = #f
  copy-tags? : boolean? = #t
  progress-callback : (or/c procedure? #f) = #f
Encodes input-file to output-file and returns a result hash. The settings hash is interpreted by the selected backend.

When encoder is #f, the backend is inferred from the output file extension. Pass 'opus or 'flac to force a backend.

When copy-tags? is true, common textual tags and an embedded picture are copied from the source file to the destination file. Opus comments and cover art are written before encoding starts through libopusenc. FLAC metadata is copied after the encoded file has been written, using the read-write API from racket-audio/taglib.

When progress-callback is a procedure, it is called with a progress hash during encoding. Progress is based on the number of input frames read from the decoder, not on the number of frames written by the encoder. This matters for resampling, because output frame counts can differ from input frame counts.

(audio-encode "input.flac"
              "output.opus"
              (hash 'bitrate 224000
                    'vbr? #t
                    'complexity 10)
              #:encoder 'opus)
 
(audio-encode "input-96k.flac"
              "output-48k.flac"
              (hash 'sample-rate 48000
                    'bits-per-sample 24
                    'compression-level 8)
              #:encoder 'flac)

5.3 Result hash🔗ℹ

The result hash contains the following keys:

  • 'encoder, the selected backend symbol;

  • 'input and 'output, the source and destination paths;

  • 'input-format, the final decoded input format hash seen by the pipeline;

  • 'output-format, the resolved backend output format hash;

  • 'frames-read, the number of input frames consumed;

  • 'frames-written, the number of frames accepted by the backend;

  • 'tag-copy, a hash describing how metadata was handled.

The 'tag-copy hash contains a 'method key. For Opus the method is 'libopusenc-comments, because metadata must be supplied to libopusenc before the encoder writes the OpusTags packet. For FLAC the method is 'taglib-post-copy, because the encoded file is tagged after encoding.

5.4 Progress callback🔗ℹ

The progress callback receives a hash with at least these keys:

  • 'phase, such as 'format, 'audio, 'finished-encoding, or 'finished;

  • 'frames-read and 'frames-written;

  • 'total-frames, when the decoder reported a known input length;

  • 'progress, a number between 0.0 and 1.0 when 'total-frames is known, otherwise #f;

  • 'input-format and, after the backend has opened, 'output-format.

A simple command-line style progress callback can print a percentage on one line:

(define (show-progress h)
  (let ((p (hash-ref h 'progress #f)))
    (when (number? p)
      (printf "\rprogress: ~a%" (round (* 100 p)))
      (flush-output))))

5.5 Opus settings🔗ℹ

The Opus backend uses libopusenc. The input PCM is converted to interleaved floating-point samples in the range -1.0 to 1.0 and written with ope_encoder_write_float. The source sample rate is passed to libopusenc; libopusenc performs the required internal resampling for Opus output.

The following settings are recognised:

  • 'bitrate, bitrate in bits per second. The default is 160000.

  • 'vbr?, whether variable bitrate is enabled. The default is #t.

  • 'constrained-vbr?, whether constrained VBR is enabled. The default is #f.

  • 'complexity, encoder complexity. The default is 10.

  • 'comment-padding, Opus comment padding in bytes. The default is 512.

  • 'signal, optionally 'auto, 'voice, or 'music.

  • 'lsb-depth, optionally passed to the encoder as the source least significant bit depth.

  • 'comments, an optional hash of Opus comment strings. When #:copy-tags? is true, audio-encode fills this from the source tags.

  • 'picture, an optional picture value from racket-audio/taglib. When #:copy-tags? is true, audio-encode fills this from the source tags.

The first backend version supports mono and stereo input.

5.6 FLAC settings🔗ℹ

The FLAC backend uses the libFLAC stream encoder. It writes interleaved integer PCM samples through the FLAC encoder API. When the requested output sample rate differs from the decoded input format, racket-audio/private/pcm-converter uses racket-audio/resampler, backed by libsoxr, to perform PCM sample-rate conversion. Channel-count conversion is not handled by this SoXR path; keep the source channel count for encoder-side conversion.

The following settings are recognised:

  • 'compression-level, FLAC compression level. The default is 5.

  • 'verify?, whether the FLAC encoder verifies encoded output. The default is #f.

  • 'blocksize, explicit FLAC block size. The default is 0, meaning the library default.

  • 'sample-rate or 'target-sample-rate, target sample rate in Hz. Use 'source or omit the key to keep the source rate.

  • 'channels or 'target-channels, target channel count. Use 'source or omit the key to keep the source channel count.

  • 'bits-per-sample or 'target-bits-per-sample, target bit depth. Use 'source or omit the key to keep the source bit depth.

For example, a 24-bit 96 kHz FLAC file can be transcoded to 24-bit 48 kHz FLAC with:

(audio-encode "input-96k.flac"
              "output-48k.flac"
              (hash 'sample-rate 48000
                    'bits-per-sample 24
                    'compression-level 8)
              #:encoder 'flac)

5.7 Encoder registration🔗ℹ

Returns the extensions supported by the currently registered encoders. The initial list includes "flac", "opus", and "oga".

procedure

(make-audio-encoder exts    
  open    
  write    
  finish    
  settings)  audio-encoder?
  exts : (listof string?)
  open : procedure?
  write : procedure?
  finish : procedure?
  settings : procedure?
Creates an encoder descriptor. The descriptor is used by audio-register-encoder! to register a backend.

The open procedure receives the output file, settings hash, and input format hash. The write procedure receives the backend handle, buffer format hash, byte buffer, and byte length, and returns the number of frames accepted by the backend. The finish procedure finalises and releases the backend handle. The settings procedure resolves backend defaults against the input format and returns the output format hash.

procedure

(audio-encoder? v)  boolean?

  v : any/c
Returns #t when v is an encoder descriptor.

procedure

(audio-register-encoder! type encoder)  void?

  type : symbol?
  encoder : audio-encoder?
Registers encoder under type. The encoder’s extensions are used for extension-based selection in audio-encode.