feat(tts): complete KittenTTS integration (tools/setup/docs/tests)

Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the provider behaves like every other TTS backend end to end. - tools/tts_tool.py: `_check_kittentts_available()` helper and wire into `check_tts_requirements()`; extend Opus-conversion list to include kittentts (WAV → Opus for Telegram voice bubbles); point the missing-package error at `hermes setup tts`. - hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech" toolset picker, with a `kittentts` post_setup hook that auto-installs the wheel + soundfile via pip. - hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install flow in `_setup_tts_provider()`, provider_labels entry, and status row in the `hermes setup` summary. - website/docs/user-guide/features/tts.md: add KittenTTS to the provider table, config example, ffmpeg note, and the zero-config voice-bubble tip. - tests/tools/test_tts_kittentts.py: 10 unit tests covering generation, model caching, config passthrough, ffmpeg conversion, availability detection, and the missing-package dispatcher branch. E2E verified against the real `kittentts` wheel: - WAV direct output (pcm_s16le, 24kHz mono) - MP3 conversion via ffmpeg (from WAV) - Telegram flow (provider in Opus-conversion list) produces `codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the `[[audio_as_voice]]` marker - check_tts_requirements() returns True when kittentts is installed
2026-07-20 15:33:54 +00:00 · 2026-04-21 00:44:37 -07:00 · 2026-04-21 00:44:37 -07:00 · 2d7ff9c5bd
commit 2d7ff9c5bd
parent 1830ebfc52
5 changed files with 321 additions and 6 deletions
--- a/website/docs/user-guide/features/tts.md
+++ b/website/docs/user-guide/features/tts.md
@ -14,7 +14,7 @@ If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription,

 ## Text-to-Speech

-Convert text to speech with eight providers:
+Convert text to speech with nine providers:

 | Provider | Quality | Cost | API Key |
 |----------|---------|------|---------|
@ -25,7 +25,8 @@ Convert text to speech with eight providers:
 | **Mistral (Voxtral TTS)** | Excellent | Paid | `MISTRAL_API_KEY` |
 | **Google Gemini TTS** | Excellent | Free tier | `GEMINI_API_KEY` |
 | **xAI TTS** | Excellent | Paid | `XAI_API_KEY` |
-| **NeuTTS** | Good | Free | None needed |
+| **NeuTTS** | Good | Free (local) | None needed |
+| **KittenTTS** | Good | Free (local) | None needed |

 ### Platform Delivery

@ -41,7 +42,7 @@ Convert text to speech with eight providers:
 ```yaml
 # In ~/.hermes/config.yaml
 tts:
-  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts"
+  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" | "kittentts"
  speed: 1.0                    # Global speed multiplier (provider-specific settings override this)
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
@ -77,6 +78,11 @@ tts:
    ref_text: ''
    model: neuphonic/neutts-air-q4-gguf
    device: cpu
+  kittentts:
+    model: KittenML/kitten-tts-nano-0.8-int8   # 25MB int8; also: kitten-tts-micro-0.8 (41MB), kitten-tts-mini-0.8 (80MB)
+    voice: Jasper                               # Jasper, Bella, Luna, Bruno, Rosie, Hugo, Kiki, Leo
+    speed: 1.0                                  # 0.5 - 2.0
+    clean_text: true                            # Expand numbers, currencies, units
 ```

 **Speed control**: The global `tts.speed` value applies to all providers by default. Each provider can override it with its own `speed` setting (e.g., `tts.openai.speed: 1.5`). Provider-specific speed takes precedence over the global value. Default is `1.0` (normal speed).
@ -91,6 +97,7 @@ Telegram voice bubbles require Opus/OGG audio format:
 - **Google Gemini TTS** outputs raw PCM and uses **ffmpeg** to encode Opus directly for Telegram voice bubbles
 - **xAI TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
 - **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
+- **KittenTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles

 ```bash
 # Ubuntu/Debian
@ -103,7 +110,7 @@ brew install ffmpeg
 sudo dnf install ffmpeg
 ```

-Without ffmpeg, Edge TTS, MiniMax TTS, and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
+Without ffmpeg, Edge TTS, MiniMax TTS, NeuTTS, and KittenTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).

 :::tip
 If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider.