Wavenet TTS API Interface
(require wavenet) | package: wavenet |
A Racket interface for Google’s Wavenet text-to-speech engine.
The functions in this module make HTTP requests to a Google Cloud API (see endpoint). You will need a valid API key from Google in order to make use of this package.
The source code is on Github and licensed under the Blue Oak Model License 1.0.0.
Here’s an example program:
#lang racket (require wavenet racket/gui/base) (api-key (file->string "api.rktd")) ;; One way to pick a voice. (Use voice-names to list available voices.) (define eliza (select-voice "en-GB-Wavenet-F (FEMALE)")) ;; Another way to pick a voice. Statically defining a voice allows us to avoid ;; an extra API call to fetch voices every time the program is run. But you ;; can’t just make up your own values! (define british-dude #hasheq((languageCodes . ("en-GB")) (name . "en-GB-Wavenet-B") (naturalSampleRateHertz . 24000) (ssmlGender . "MALE"))) ;; Turn your sound up and call this function! (define (say text) (synthesize text british-dude #:output-file "temp.mp3") (play-sound "temp.mp3" #t))
You should store this key in a separate file and make sure to exclude that file from Git (or whatever version control system you use). Then you can load it up at runtime:
(api-key (file->string "api.rktd"))
procedure
(voice-names [prefix]) → (listof string?)
prefix : string? = ""
If prefix is provided, only names that begin with prefix will be included in that list. Voice names have a standard format — for example, "en-AU-Wavenet-A (FEMALE)", so prefix is good for narrowing the list to particular languages, or language/engine combinations.
Each time your program is run, the first call to either voice-names or select-voice will generate an API call to endpoint to fetch information about voices currently available from Google Cloud. If api-key is not set, or if the HTTP response code is anything other than 200, an exn:fail:user exception is raised. Subsequent calls to these two functions will refer to a local cache of this information instead of making another API call.
procedure
(select-voice voice-name) → voice?
voice-name : string?
Each time your program is run, the first call to either voice-names or select-voice will generate an API call to endpoint to fetch information about voices currently available from Google Cloud. If api-key is not set, or if the HTTP response code is anything other than 200, an exn:fail:user exception is raised. Subsequent calls to these two functions will refer to a local cache of this information instead of making another API call.
procedure
(synthesize text voice-or-name [ #:output-file filename]) → (or/c bytes? integer?) text : string? voice-or-name : (or/c voice? string?) filename : (or/c #f path-string?) = #f
If api-key is not set, or if the HTTP response code is anything other than 200, an exn:fail:user exception is raised.
If #:output-file is specified, the bytes of the MP3 audio are saved to that file, silently overwriting it if it exists already, and the return value is the number of bytes written out. Otherwise the MP3 audio bytes are themselves returned.
struct
(struct voice ( languageCodes name naturalSampleRateHertz ssmlGender)) languageCodes : (listof string?) name : string? naturalSampleRateHertz : integer? ssmlGender : string?
> (require wavenet json) > (define british-dude (voice '("en-GB") "en-GB-Wavenet-B" 24000 "MALE")) > british-dude
'#hasheq((languageCodes . ("en-GB"))
(name . "en-GB-Wavenet-B")
(naturalSampleRateHertz . 24000)
(ssmlGender . "MALE"))
> (hash-ref british-dude 'name) "en-GB-Wavenet-B"
> (voice-name british-dude) "en-GB-Wavenet-B"
> (display (jsexpr->string british-dude)) {"languageCodes":["en-GB"],"name":"en-GB-Wavenet-B","naturalSampleRateHertz":24000,"ssmlGender":"MALE"}