mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398
133 lines
4.3 KiB
Python
133 lines
4.3 KiB
Python
"""
|
|
TTS Provider Registry
|
|
=====================
|
|
|
|
Central map of registered TTS providers. Populated by plugins at
|
|
import-time via :meth:`PluginContext.register_tts_provider`; consumed
|
|
by :mod:`tools.tts_tool` to dispatch ``text_to_speech`` tool calls to
|
|
the active plugin backend **when** the configured ``tts.provider``
|
|
name is neither a built-in nor a command-type provider.
|
|
|
|
Built-ins-always-win
|
|
--------------------
|
|
Plugin names that collide with a built-in TTS provider (``edge``,
|
|
``openai``, ``elevenlabs``, ``minimax``, ``gemini``, ``mistral``,
|
|
``xai``, ``piper``, ``kittentts``, ``neutts``) are rejected at
|
|
registration with a warning. This invariant is also re-checked at
|
|
dispatch time in :func:`tools.tts_tool._dispatch_to_plugin_provider`.
|
|
|
|
Command-providers-win-over-plugins
|
|
----------------------------------
|
|
This registry doesn't enforce the command-vs-plugin precedence — that
|
|
lives in the dispatcher, which checks for a same-name
|
|
``tts.providers.<name>: type: command`` entry before consulting the
|
|
registry. The rationale is locality: a name declared in the user's
|
|
``config.yaml`` is more specific to their setup than a plugin that
|
|
happens to be installed.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
import threading
|
|
from typing import Dict, List, Optional
|
|
|
|
from agent.tts_provider import TTSProvider
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
# Names reserved for native built-in TTS handlers. Plugins cannot
|
|
# register a name in this set — the registration call is rejected with
|
|
# a warning. **Kept in sync with ``BUILTIN_TTS_PROVIDERS`` in
|
|
# :mod:`tools.tts_tool`** — a regression test in
|
|
# ``tests/agent/test_tts_registry.py::TestBuiltinSync`` fails if the
|
|
# two lists drift. Importing from ``tools.tts_tool`` directly would
|
|
# create a circular dependency (``tools.tts_tool`` imports
|
|
# ``agent.tts_registry`` for dispatch).
|
|
_BUILTIN_NAMES = frozenset({
|
|
"edge",
|
|
"elevenlabs",
|
|
"openai",
|
|
"minimax",
|
|
"xai",
|
|
"mistral",
|
|
"gemini",
|
|
"neutts",
|
|
"kittentts",
|
|
"piper",
|
|
})
|
|
|
|
|
|
_providers: Dict[str, TTSProvider] = {}
|
|
_lock = threading.Lock()
|
|
|
|
|
|
def register_provider(provider: TTSProvider) -> None:
|
|
"""Register a TTS provider.
|
|
|
|
Rejects:
|
|
|
|
- Non-:class:`TTSProvider` instances (raises :class:`TypeError`).
|
|
- Empty/whitespace ``.name`` (raises :class:`ValueError`).
|
|
- Names colliding with a built-in (logs a warning, silently
|
|
ignores — built-ins-always-win invariant).
|
|
|
|
Re-registration (same ``name``) overwrites the previous entry and
|
|
logs a debug message — makes hot-reload scenarios (tests, dev
|
|
loops) behave predictably.
|
|
"""
|
|
if not isinstance(provider, TTSProvider):
|
|
raise TypeError(
|
|
f"register_provider() expects a TTSProvider instance, "
|
|
f"got {type(provider).__name__}"
|
|
)
|
|
name = provider.name
|
|
if not isinstance(name, str) or not name.strip():
|
|
raise ValueError("TTS provider .name must be a non-empty string")
|
|
key = name.strip().lower()
|
|
if key in _BUILTIN_NAMES:
|
|
logger.warning(
|
|
"TTS provider '%s' shadows a built-in name; registration ignored. "
|
|
"Built-in TTS providers (%s) always win — pick a different name.",
|
|
key, ", ".join(sorted(_BUILTIN_NAMES)),
|
|
)
|
|
return
|
|
with _lock:
|
|
existing = _providers.get(key)
|
|
_providers[key] = provider
|
|
if existing is not None:
|
|
logger.debug(
|
|
"TTS provider '%s' re-registered (was %r)",
|
|
key, type(existing).__name__,
|
|
)
|
|
else:
|
|
logger.debug(
|
|
"Registered TTS provider '%s' (%s)",
|
|
key, type(provider).__name__,
|
|
)
|
|
|
|
|
|
def list_providers() -> List[TTSProvider]:
|
|
"""Return all registered providers, sorted by name."""
|
|
with _lock:
|
|
items = list(_providers.values())
|
|
return sorted(items, key=lambda p: p.name)
|
|
|
|
|
|
def get_provider(name: str) -> Optional[TTSProvider]:
|
|
"""Return the provider registered under *name*, or None.
|
|
|
|
Name matching is case-insensitive and whitespace-tolerant — mirrors
|
|
how ``tools.tts_tool._get_provider`` normalizes the configured
|
|
``tts.provider`` value.
|
|
"""
|
|
if not isinstance(name, str):
|
|
return None
|
|
return _providers.get(name.strip().lower())
|
|
|
|
|
|
def _reset_for_tests() -> None:
|
|
"""Clear the registry. **Test-only.**"""
|
|
with _lock:
|
|
_providers.clear()
|