mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
feat(tts): complete KittenTTS integration (tools/setup/docs/tests)
Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the provider behaves like every other TTS backend end to end. - tools/tts_tool.py: `_check_kittentts_available()` helper and wire into `check_tts_requirements()`; extend Opus-conversion list to include kittentts (WAV → Opus for Telegram voice bubbles); point the missing-package error at `hermes setup tts`. - hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech" toolset picker, with a `kittentts` post_setup hook that auto-installs the wheel + soundfile via pip. - hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install flow in `_setup_tts_provider()`, provider_labels entry, and status row in the `hermes setup` summary. - website/docs/user-guide/features/tts.md: add KittenTTS to the provider table, config example, ffmpeg note, and the zero-config voice-bubble tip. - tests/tools/test_tts_kittentts.py: 10 unit tests covering generation, model caching, config passthrough, ffmpeg conversion, availability detection, and the missing-package dispatcher branch. E2E verified against the real `kittentts` wheel: - WAV direct output (pcm_s16le, 24kHz mono) - MP3 conversion via ffmpeg (from WAV) - Telegram flow (provider in Opus-conversion list) produces `codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the `[[audio_as_voice]]` marker - check_tts_requirements() returns True when kittentts is installed
This commit is contained in:
parent
1830ebfc52
commit
2d7ff9c5bd
5 changed files with 321 additions and 6 deletions
|
|
@ -14,7 +14,7 @@ If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription,
|
|||
|
||||
## Text-to-Speech
|
||||
|
||||
Convert text to speech with eight providers:
|
||||
Convert text to speech with nine providers:
|
||||
|
||||
| Provider | Quality | Cost | API Key |
|
||||
|----------|---------|------|---------|
|
||||
|
|
@ -25,7 +25,8 @@ Convert text to speech with eight providers:
|
|||
| **Mistral (Voxtral TTS)** | Excellent | Paid | `MISTRAL_API_KEY` |
|
||||
| **Google Gemini TTS** | Excellent | Free tier | `GEMINI_API_KEY` |
|
||||
| **xAI TTS** | Excellent | Paid | `XAI_API_KEY` |
|
||||
| **NeuTTS** | Good | Free | None needed |
|
||||
| **NeuTTS** | Good | Free (local) | None needed |
|
||||
| **KittenTTS** | Good | Free (local) | None needed |
|
||||
|
||||
### Platform Delivery
|
||||
|
||||
|
|
@ -41,7 +42,7 @@ Convert text to speech with eight providers:
|
|||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
tts:
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts"
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" | "kittentts"
|
||||
speed: 1.0 # Global speed multiplier (provider-specific settings override this)
|
||||
edge:
|
||||
voice: "en-US-AriaNeural" # 322 voices, 74 languages
|
||||
|
|
@ -77,6 +78,11 @@ tts:
|
|||
ref_text: ''
|
||||
model: neuphonic/neutts-air-q4-gguf
|
||||
device: cpu
|
||||
kittentts:
|
||||
model: KittenML/kitten-tts-nano-0.8-int8 # 25MB int8; also: kitten-tts-micro-0.8 (41MB), kitten-tts-mini-0.8 (80MB)
|
||||
voice: Jasper # Jasper, Bella, Luna, Bruno, Rosie, Hugo, Kiki, Leo
|
||||
speed: 1.0 # 0.5 - 2.0
|
||||
clean_text: true # Expand numbers, currencies, units
|
||||
```
|
||||
|
||||
**Speed control**: The global `tts.speed` value applies to all providers by default. Each provider can override it with its own `speed` setting (e.g., `tts.openai.speed: 1.5`). Provider-specific speed takes precedence over the global value. Default is `1.0` (normal speed).
|
||||
|
|
@ -91,6 +97,7 @@ Telegram voice bubbles require Opus/OGG audio format:
|
|||
- **Google Gemini TTS** outputs raw PCM and uses **ffmpeg** to encode Opus directly for Telegram voice bubbles
|
||||
- **xAI TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
- **KittenTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
|
|
@ -103,7 +110,7 @@ brew install ffmpeg
|
|||
sudo dnf install ffmpeg
|
||||
```
|
||||
|
||||
Without ffmpeg, Edge TTS, MiniMax TTS, and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
|
||||
Without ffmpeg, Edge TTS, MiniMax TTS, NeuTTS, and KittenTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
|
||||
|
||||
:::tip
|
||||
If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue