feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885)

Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-05-02 02:01:47 +00:00 · 2026-04-30 02:53:20 -07:00 · 2026-04-30 02:53:20 -07:00 · 8d302e37a8
commit 8d302e37a8
parent 2662bfb756
8 changed files with 634 additions and 34 deletions
--- a/hermes_cli/tools_config.py
+++ b/hermes_cli/tools_config.py
@ -227,6 +227,14 @@ TOOL_CATEGORIES = {
                "tts_provider": "kittentts",
                "post_setup": "kittentts",
            },
+            {
+                "name": "Piper",
+                "badge": "local · free",
+                "tag": "Local neural TTS, 44 languages (voices ~20-90MB)",
+                "env_vars": [],
+                "tts_provider": "piper",
+                "post_setup": "piper",
+            },
        ],
    },
    "web": {
@ -624,6 +632,33 @@ def _run_post_setup(post_setup_key: str):
            _print_warning("    kittentts install timed out (>5min)")
            _print_info(f"    Run manually: python -m pip install -U '{wheel_url}' soundfile")

+    elif post_setup_key == "piper":
+        try:
+            __import__("piper")
+            _print_success("    piper-tts is already installed")
+        except ImportError:
+            import subprocess
+            _print_info("    Installing piper-tts (~14MB wheel, voices downloaded on first use)...")
+            try:
+                result = subprocess.run(
+                    [sys.executable, "-m", "pip", "install", "-U", "piper-tts", "--quiet"],
+                    capture_output=True, text=True, timeout=300,
+                )
+                if result.returncode == 0:
+                    _print_success("    piper-tts installed")
+                else:
+                    _print_warning("    piper-tts install failed:")
+                    _print_info(f"      {result.stderr.strip()[:300]}")
+                    _print_info("    Run manually: python -m pip install -U piper-tts")
+                    return
+            except subprocess.TimeoutExpired:
+                _print_warning("    piper-tts install timed out (>5min)")
+                _print_info("    Run manually: python -m pip install -U piper-tts")
+                return
+        _print_info("    Default voice: en_US-lessac-medium (downloaded on first TTS call)")
+        _print_info("    Full voice list: https://github.com/OHF-Voice/piper1-gpl/blob/main/docs/VOICES.md")
+        _print_info("    Switch voices by setting tts.piper.voice in ~/.hermes/config.yaml")
+
    elif post_setup_key == "spotify":
        # Run the full `hermes auth spotify` flow — if the user has no
        # client_id yet, this drops them into the interactive wizard