mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-08 03:01:47 +00:00
feat(tts): add Gemini TTS provider
Add Google's Gemini speech-generation API as 8th TTS backend. Returns base64-encoded signed 16-bit PCM at 24 kHz mono, wrapped in WAV natively via stdlib wave module. Optional ffmpeg conversion to mp3/ogg for Telegram voice bubbles. Supports GEMINI_API_KEY and GOOGLE_API_KEY (fallback), 30 prebuilt voices, configurable model (flash/pro). Cherry-picked from #10922 by @zhonghui5207. Fixes #10918.
This commit is contained in:
parent
f19ca50cd9
commit
0671201c05
7 changed files with 388 additions and 26 deletions
|
|
@ -10,7 +10,7 @@ Hermes Agent supports both text-to-speech output and voice message transcription
|
|||
|
||||
## Text-to-Speech
|
||||
|
||||
Convert text to speech with six providers:
|
||||
Convert text to speech with seven providers:
|
||||
|
||||
| Provider | Quality | Cost | API Key |
|
||||
|----------|---------|------|---------|
|
||||
|
|
@ -20,6 +20,7 @@ Convert text to speech with six providers:
|
|||
| **MiniMax TTS** | Excellent | Paid | `MINIMAX_API_KEY` |
|
||||
| **Mistral (Voxtral TTS)** | Excellent | Paid | `MISTRAL_API_KEY` |
|
||||
| **NeuTTS** | Good | Free | None needed |
|
||||
| **Gemini TTS** | Excellent | Paid (free tier) | `GEMINI_API_KEY` |
|
||||
|
||||
### Platform Delivery
|
||||
|
||||
|
|
@ -62,6 +63,9 @@ tts:
|
|||
ref_text: ''
|
||||
model: neuphonic/neutts-air-q4-gguf
|
||||
device: cpu
|
||||
gemini:
|
||||
model: "gemini-2.5-flash-preview-tts" # or gemini-2.5-pro-preview-tts
|
||||
voice: "Kore" # 30 prebuilt voices (Zephyr, Puck, Charon, ...)
|
||||
```
|
||||
|
||||
**Speed control**: The global `tts.speed` value applies to all providers by default. Each provider can override it with its own `speed` setting (e.g., `tts.openai.speed: 1.5`). Provider-specific speed takes precedence over the global value. Default is `1.0` (normal speed).
|
||||
|
|
@ -74,6 +78,7 @@ Telegram voice bubbles require Opus/OGG audio format:
|
|||
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
|
||||
- **MiniMax TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
- **Gemini TTS** returns raw PCM (wrapped in WAV natively) and needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue