feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885)

Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the
Home Assistant project that supports 44 languages with zero API keys.
Adds it as a native built-in provider alongside edge/neutts/kittentts,
installable via 'hermes tools' with one keystroke.

What ships:

- New 'piper' built-in provider in tools/tts_tool.py
  - Lazy import via _import_piper()
  - Module-level voice cache keyed on (model_path, use_cuda) so switching
    voices doesn't invalidate older cached voices
  - _resolve_piper_voice_path() accepts either an absolute .onnx path or a
    voice name (auto-downloaded on first use via 'python -m
    piper.download_voices --download-dir <cache>')
  - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via
    get_hermes_dir)
  - Optional SynthesisConfig knobs: length_scale, noise_scale,
    noise_w_scale, volume, normalize_audio, use_cuda — passed through
    only when configured, so older piper-tts versions aren't broken
  - WAV output then ffmpeg conversion path (same as neutts/kittentts) so
    Telegram voice bubbles work when ffmpeg is present
  - Piper added to BUILTIN_TTS_PROVIDERS so a user's
    tts.providers.piper.command cannot shadow the native provider
    (regression test included)

- 'hermes tools' wizard entry
  - Piper appears under Voice and TTS as local free, with
    'pip install piper-tts' auto-install via post_setup handler
  - Prints voice-catalog URL and default-voice info after install

- config.yaml defaults
  - tts.piper.voice defaults to en_US-lessac-medium
  - Commented advanced knobs for discoverability

- Docs
  - New 'Piper (local, 44 languages)' section in features/tts.md
    explaining install path, voice switching, pre-downloaded voices,
    and advanced knobs
  - Piper listed in the ten-provider table and ffmpeg table
  - Custom-command-providers section updated to drop the Piper example
    (now native) and add a piper-custom example for users with their own
    trained .onnx models
  - overview.md bumps provider count to ten

- Tests (tests/tools/test_tts_piper.py, 16 tests)
  - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH)
  - _resolve_piper_voice_path across every branch: direct .onnx path,
    cached voice name, fresh download with correct CLI args, download
    failure, successful-exit-but-missing-files, empty voice to default
  - _generate_piper_tts: loads voice once, reuses cache, voice-name
    download wiring, advanced knobs flow through SynthesisConfig
  - text_to_speech_tool end-to-end dispatch and missing-package error
  - check_tts_requirements: piper availability toggles the return value
  - Regression guard: piper cannot be shadowed by a command provider
    with the same name
  - Pre-existing test_tts_mistral test broadened to mock the new
    piper/kittentts/command-provider checks (otherwise it false-passes
    when piper is installed in the test venv)

E2E verification (live):

Actual pip install piper-tts, config piper + en_US-lessac-low,
text_to_speech_tool call, voice auto-downloaded from HuggingFace,
WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the
cache (~60ms). Cache dir populated with .onnx and .onnx.json.

This caught a real bug during development: the first pass used '-d' as
the download-dir flag; the actual piper.download_voices CLI wants
'--download-dir'. Fixed before PR opened.
This commit is contained in:
Teknium 2026-04-30 02:53:20 -07:00 committed by GitHub
parent 2662bfb756
commit 8d302e37a8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 634 additions and 34 deletions

View file

@ -227,6 +227,14 @@ TOOL_CATEGORIES = {
"tts_provider": "kittentts",
"post_setup": "kittentts",
},
{
"name": "Piper",
"badge": "local · free",
"tag": "Local neural TTS, 44 languages (voices ~20-90MB)",
"env_vars": [],
"tts_provider": "piper",
"post_setup": "piper",
},
],
},
"web": {
@ -624,6 +632,33 @@ def _run_post_setup(post_setup_key: str):
_print_warning(" kittentts install timed out (>5min)")
_print_info(f" Run manually: python -m pip install -U '{wheel_url}' soundfile")
elif post_setup_key == "piper":
try:
__import__("piper")
_print_success(" piper-tts is already installed")
except ImportError:
import subprocess
_print_info(" Installing piper-tts (~14MB wheel, voices downloaded on first use)...")
try:
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "-U", "piper-tts", "--quiet"],
capture_output=True, text=True, timeout=300,
)
if result.returncode == 0:
_print_success(" piper-tts installed")
else:
_print_warning(" piper-tts install failed:")
_print_info(f" {result.stderr.strip()[:300]}")
_print_info(" Run manually: python -m pip install -U piper-tts")
return
except subprocess.TimeoutExpired:
_print_warning(" piper-tts install timed out (>5min)")
_print_info(" Run manually: python -m pip install -U piper-tts")
return
_print_info(" Default voice: en_US-lessac-medium (downloaded on first TTS call)")
_print_info(" Full voice list: https://github.com/OHF-Voice/piper1-gpl/blob/main/docs/VOICES.md")
_print_info(" Switch voices by setting tts.piper.voice in ~/.hermes/config.yaml")
elif post_setup_key == "spotify":
# Run the full `hermes auth spotify` flow — if the user has no
# client_id yet, this drops them into the interactive wizard