From 00ec0b617cf6013d05a00cbdd5776528f7f900e8 Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Fri, 22 May 2026 17:58:07 +0530 Subject: [PATCH] feat(tts): add register_tts_provider() plugin hook (closes #30398) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398 --- agent/tts_provider.py | 274 +++++++++++++++ agent/tts_registry.py | 133 +++++++ hermes_cli/plugins.py | 38 ++ hermes_cli/tools_config.py | 62 ++++ tests/agent/test_tts_registry.py | 312 +++++++++++++++++ .../test_plugins_tts_registration.py | 156 +++++++++ tests/hermes_cli/test_tts_picker.py | 187 ++++++++++ tests/plugins/tts/__init__.py | 0 tests/plugins/tts/check_parity_vs_main.py | 328 ++++++++++++++++++ tests/tools/test_tts_plugin_dispatch.py | 323 +++++++++++++++++ tools/tts_tool.py | 144 ++++++++ website/docs/user-guide/features/plugins.md | 2 +- website/docs/user-guide/features/tts.md | 79 +++++ 13 files changed, 2037 insertions(+), 1 deletion(-) create mode 100644 agent/tts_provider.py create mode 100644 agent/tts_registry.py create mode 100644 tests/agent/test_tts_registry.py create mode 100644 tests/hermes_cli/test_plugins_tts_registration.py create mode 100644 tests/hermes_cli/test_tts_picker.py create mode 100644 tests/plugins/tts/__init__.py create mode 100644 tests/plugins/tts/check_parity_vs_main.py create mode 100644 tests/tools/test_tts_plugin_dispatch.py diff --git a/agent/tts_provider.py b/agent/tts_provider.py new file mode 100644 index 00000000000..c19166a7024 --- /dev/null +++ b/agent/tts_provider.py @@ -0,0 +1,274 @@ +""" +Text-to-Speech Provider ABC +============================ + +Defines the pluggable-backend interface for text-to-speech synthesis. +Providers register instances via +``PluginContext.register_tts_provider()``; the active one (selected via +``tts.provider`` in ``config.yaml``) services every ``text_to_speech`` +tool call **only when the configured name is neither a built-in nor a +command-type provider declared under ``tts.providers.``**. + +Three coexisting TTS extension surfaces — in resolution order: + +1. **Built-in providers** (``BUILTIN_TTS_PROVIDERS`` in + :mod:`tools.tts_tool`) — native Python implementations (edge, openai, + elevenlabs, …). **Always win** — plugins cannot shadow them. +2. **Command-type providers** declared under ``tts.providers.: + type: command`` (PR #17843, commit ``2facea7f7``). Wire any local + CLI into Hermes with shell-template placeholders. **Wins over a + same-name plugin** — config is more local than plugin install. +3. **Plugin-registered providers** (this ABC). For backends that need a + Python SDK, streaming bytes, OAuth refresh, or voice-listing APIs + the shell-template grammar can't reasonably express. + +Built-ins-always-win is enforced at registration time +(:func:`agent.tts_registry.register_provider` rejects names in +``BUILTIN_TTS_PROVIDERS`` with a warning) AND at dispatch time +(:func:`tools.tts_tool._dispatch_to_plugin_provider` re-checks +defensively). The dispatcher also rejects plugin dispatch when a same- +name command provider is configured. + +Providers live in ``/plugins/tts//`` (built-in plugins, no +shipped today) or ``~/.hermes/plugins/tts//`` (user-installed). +None ship in-tree as of issue #30398 — the hook is additive +infrastructure waiting for a real consumer (Cartesia, Fish Audio, …). + +Response contract +----------------- +:meth:`TTSProvider.synthesize` writes the audio bytes to ``output_path`` +and returns the path as a string. Implementations should raise on +failure — the dispatcher converts exceptions into the standard +``{success: False, error: …}`` JSON envelope the rest of Hermes +expects. +""" + +from __future__ import annotations + +import abc +import logging +from typing import Any, Dict, Iterator, List, Optional + +logger = logging.getLogger(__name__) + + +DEFAULT_OUTPUT_FORMAT = "mp3" +VALID_OUTPUT_FORMATS = frozenset({"mp3", "wav", "ogg", "opus", "flac"}) + + +# --------------------------------------------------------------------------- +# ABC +# --------------------------------------------------------------------------- + + +class TTSProvider(abc.ABC): + """Abstract base class for a text-to-speech backend. + + Subclasses must implement :attr:`name` and :meth:`synthesize`. + Everything else has sane defaults — override only what your provider + needs. + """ + + @property + @abc.abstractmethod + def name(self) -> str: + """Stable short identifier used in ``tts.provider`` config. + + Lowercase, no spaces. Examples: ``cartesia``, ``fishaudio``, + ``deepgram``. Names that collide with a built-in TTS provider + (``edge``, ``openai``, ``elevenlabs``, ``minimax``, ``gemini``, + ``mistral``, ``xai``, ``piper``, ``kittentts``, ``neutts``) are + rejected at registration time. + """ + + @property + def display_name(self) -> str: + """Human-readable label shown in ``hermes tools``. + + Defaults to ``name.title()`` (e.g. ``Cartesia`` for ``cartesia``). + """ + return self.name.title() + + def is_available(self) -> bool: + """Return True when this provider can service calls. + + Typically checks for a required API key + that the SDK is + importable. Default: True (providers with no external + dependencies are always available). + + Must NOT raise — used by the picker and ``hermes setup`` for + availability displays and should fail gracefully. + """ + return True + + def list_voices(self) -> List[Dict[str, Any]]: + """Return voice catalog entries. + + Each entry:: + + { + "id": "voice-abc-123", # required + "display": "Aria — neutral female", # optional; defaults to id + "language": "en-US", # optional + "gender": "female", # optional + "preview_url": "https://...mp3", # optional + } + + Default: empty list (provider has no enumerable voices or + doesn't surface them via API). + """ + return [] + + def list_models(self) -> List[Dict[str, Any]]: + """Return model catalog entries. + + Each entry:: + + { + "id": "sonic-2", # required + "display": "Sonic 2", # optional + "languages": ["en", "es", "fr"], # optional + "max_text_length": 5000, # optional + } + + Default: empty list (provider has a single fixed model or + doesn't expose model selection). + """ + return [] + + def get_setup_schema(self) -> Dict[str, Any]: + """Return provider metadata for the ``hermes tools`` picker. + + Used by ``tools_config.py`` to inject this provider as a row in + the Text-to-Speech provider list. Shape:: + + { + "name": "Cartesia", # picker label + "badge": "paid", # optional short tag + "tag": "Ultra-low-latency streaming", # optional subtitle + "env_vars": [ # keys to prompt for + {"key": "CARTESIA_API_KEY", + "prompt": "Cartesia API key", + "url": "https://play.cartesia.ai/console"}, + ], + } + + Default: minimal entry derived from ``display_name`` with no + env vars. Override to expose API key prompts and custom badges. + """ + return { + "name": self.display_name, + "badge": "", + "tag": "", + "env_vars": [], + } + + def default_model(self) -> Optional[str]: + """Return the default model id, or None if not applicable.""" + models = self.list_models() + if models: + return models[0].get("id") + return None + + def default_voice(self) -> Optional[str]: + """Return the default voice id, or None if not applicable.""" + voices = self.list_voices() + if voices: + return voices[0].get("id") + return None + + @abc.abstractmethod + def synthesize( + self, + text: str, + output_path: str, + *, + voice: Optional[str] = None, + model: Optional[str] = None, + speed: Optional[float] = None, + format: str = DEFAULT_OUTPUT_FORMAT, + **extra: Any, + ) -> str: + """Synthesize ``text`` and write audio bytes to ``output_path``. + + Returns the absolute path to the written file as a string + (typically just echoes ``output_path``). Raises on failure — + the dispatcher converts exceptions to the standard + ``{success: False, error: ...}`` JSON envelope. + + Args: + text: The text to synthesize. Already truncated to the + provider's max length by the dispatcher. + output_path: Absolute path where the audio file should be + written. Parent directory is guaranteed to exist. + voice: Voice identifier from :meth:`list_voices`, or None + to use :meth:`default_voice`. + model: Model identifier from :meth:`list_models`, or None + to use :meth:`default_model`. + speed: Optional speech-rate multiplier (1.0 = normal). + Providers that don't support speed control should + ignore this argument. + format: Output audio format. Implementations should match + the requested format when possible; if unsupported, + pick the closest equivalent and ensure ``output_path`` + ends with the correct extension. + **extra: Forward-compat parameters future schema versions + may expose. Implementations should ignore unknown keys. + """ + + def stream( + self, + text: str, + *, + voice: Optional[str] = None, + model: Optional[str] = None, + format: str = "opus", + **extra: Any, + ) -> Iterator[bytes]: + """Stream synthesized audio bytes. + + Optional. Providers that don't support streaming raise + :class:`NotImplementedError` (the default) and the dispatcher + falls back to :meth:`synthesize` + read-whole-file. + + Args mirror :meth:`synthesize`. Default ``format`` is ``opus`` + because the primary streaming use case is voice-bubble + delivery (Telegram et al.) which requires Opus. + """ + raise NotImplementedError( + f"TTS provider {self.name!r} does not implement streaming " + "synthesis. Use synthesize() instead, or implement stream() " + "if your backend supports it." + ) + + @property + def voice_compatible(self) -> bool: + """Whether output is suitable for voice-bubble delivery. + + Mirrors the ``tts.providers..voice_compatible`` field + from PR #17843. When True, the gateway's voice-message + delivery pipeline runs ffmpeg conversion to Opus if needed. + When False, output is delivered as a regular audio attachment. + + Default: False (safe — providers opt in explicitly). + """ + return False + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def resolve_output_format(value: Optional[str]) -> str: + """Clamp an output_format value to the valid set. + + Invalid values are coerced to :data:`DEFAULT_OUTPUT_FORMAT` rather + than rejected so the tool surface is forgiving of agent mistakes. + """ + if not isinstance(value, str): + return DEFAULT_OUTPUT_FORMAT + v = value.strip().lower() + if v in VALID_OUTPUT_FORMATS: + return v + return DEFAULT_OUTPUT_FORMAT diff --git a/agent/tts_registry.py b/agent/tts_registry.py new file mode 100644 index 00000000000..7cf6e6cb00a --- /dev/null +++ b/agent/tts_registry.py @@ -0,0 +1,133 @@ +""" +TTS Provider Registry +===================== + +Central map of registered TTS providers. Populated by plugins at +import-time via :meth:`PluginContext.register_tts_provider`; consumed +by :mod:`tools.tts_tool` to dispatch ``text_to_speech`` tool calls to +the active plugin backend **when** the configured ``tts.provider`` +name is neither a built-in nor a command-type provider. + +Built-ins-always-win +-------------------- +Plugin names that collide with a built-in TTS provider (``edge``, +``openai``, ``elevenlabs``, ``minimax``, ``gemini``, ``mistral``, +``xai``, ``piper``, ``kittentts``, ``neutts``) are rejected at +registration with a warning. This invariant is also re-checked at +dispatch time in :func:`tools.tts_tool._dispatch_to_plugin_provider`. + +Command-providers-win-over-plugins +---------------------------------- +This registry doesn't enforce the command-vs-plugin precedence — that +lives in the dispatcher, which checks for a same-name +``tts.providers.: type: command`` entry before consulting the +registry. The rationale is locality: a name declared in the user's +``config.yaml`` is more specific to their setup than a plugin that +happens to be installed. +""" + +from __future__ import annotations + +import logging +import threading +from typing import Dict, List, Optional + +from agent.tts_provider import TTSProvider + +logger = logging.getLogger(__name__) + + +# Names reserved for native built-in TTS handlers. Plugins cannot +# register a name in this set — the registration call is rejected with +# a warning. **Kept in sync with ``BUILTIN_TTS_PROVIDERS`` in +# :mod:`tools.tts_tool`** — a regression test in +# ``tests/agent/test_tts_registry.py::TestBuiltinSync`` fails if the +# two lists drift. Importing from ``tools.tts_tool`` directly would +# create a circular dependency (``tools.tts_tool`` imports +# ``agent.tts_registry`` for dispatch). +_BUILTIN_NAMES = frozenset({ + "edge", + "elevenlabs", + "openai", + "minimax", + "xai", + "mistral", + "gemini", + "neutts", + "kittentts", + "piper", +}) + + +_providers: Dict[str, TTSProvider] = {} +_lock = threading.Lock() + + +def register_provider(provider: TTSProvider) -> None: + """Register a TTS provider. + + Rejects: + + - Non-:class:`TTSProvider` instances (raises :class:`TypeError`). + - Empty/whitespace ``.name`` (raises :class:`ValueError`). + - Names colliding with a built-in (logs a warning, silently + ignores — built-ins-always-win invariant). + + Re-registration (same ``name``) overwrites the previous entry and + logs a debug message — makes hot-reload scenarios (tests, dev + loops) behave predictably. + """ + if not isinstance(provider, TTSProvider): + raise TypeError( + f"register_provider() expects a TTSProvider instance, " + f"got {type(provider).__name__}" + ) + name = provider.name + if not isinstance(name, str) or not name.strip(): + raise ValueError("TTS provider .name must be a non-empty string") + key = name.strip().lower() + if key in _BUILTIN_NAMES: + logger.warning( + "TTS provider '%s' shadows a built-in name; registration ignored. " + "Built-in TTS providers (%s) always win — pick a different name.", + key, ", ".join(sorted(_BUILTIN_NAMES)), + ) + return + with _lock: + existing = _providers.get(key) + _providers[key] = provider + if existing is not None: + logger.debug( + "TTS provider '%s' re-registered (was %r)", + key, type(existing).__name__, + ) + else: + logger.debug( + "Registered TTS provider '%s' (%s)", + key, type(provider).__name__, + ) + + +def list_providers() -> List[TTSProvider]: + """Return all registered providers, sorted by name.""" + with _lock: + items = list(_providers.values()) + return sorted(items, key=lambda p: p.name) + + +def get_provider(name: str) -> Optional[TTSProvider]: + """Return the provider registered under *name*, or None. + + Name matching is case-insensitive and whitespace-tolerant — mirrors + how ``tools.tts_tool._get_provider`` normalizes the configured + ``tts.provider`` value. + """ + if not isinstance(name, str): + return None + return _providers.get(name.strip().lower()) + + +def _reset_for_tests() -> None: + """Clear the registry. **Test-only.**""" + with _lock: + _providers.clear() diff --git a/hermes_cli/plugins.py b/hermes_cli/plugins.py index 5b5bf2209de..2218172aa58 100644 --- a/hermes_cli/plugins.py +++ b/hermes_cli/plugins.py @@ -640,6 +640,44 @@ class PluginContext: self.manifest.name, provider.name, ) + # -- TTS provider registration ------------------------------------------- + + def register_tts_provider(self, provider) -> None: + """Register a text-to-speech backend. + + ``provider`` must be an instance of + :class:`agent.tts_provider.TTSProvider`. The ``provider.name`` + attribute is what ``tts.provider`` in ``config.yaml`` matches + against when routing ``text_to_speech`` tool calls — **but + only when**: + + 1. ``provider.name`` is NOT a built-in TTS provider name + (``edge``, ``openai``, ``elevenlabs``, …). Built-ins always + win — the registry rejects shadowing names with a warning. + 2. There is NO ``tts.providers.: type: command`` entry + with the same name. Command-providers (PR #17843) win on + name collision because config is more local than plugin + install. + + Coexists with the command-provider registry rather than + replacing it — see issue #30398 for the full design rationale. + """ + from agent.tts_provider import TTSProvider + from agent.tts_registry import register_provider as _register_tts_provider + + if not isinstance(provider, TTSProvider): + logger.warning( + "Plugin '%s' tried to register a TTS provider that does " + "not inherit from TTSProvider. Ignoring.", + self.manifest.name, + ) + return + _register_tts_provider(provider) + logger.info( + "Plugin '%s' registered TTS provider: %s", + self.manifest.name, provider.name, + ) + # -- platform adapter registration --------------------------------------- def register_platform( diff --git a/hermes_cli/tools_config.py b/hermes_cli/tools_config.py index 23cb8e685fd..29c92160590 100644 --- a/hermes_cli/tools_config.py +++ b/hermes_cli/tools_config.py @@ -1753,6 +1753,62 @@ def _plugin_browser_providers() -> list[dict]: return rows +def _plugin_tts_providers() -> list[dict]: + """Build picker-row dicts from plugin-registered TTS providers. + + Issue #30398 — the ``register_tts_provider()`` plugin hook + coexists alongside the 10 built-in TTS providers + (``edge``/``openai``/``elevenlabs``/…) and the + ``tts.providers.: type: command`` registry from PR #17843. + Built-in rows stay hardcoded in ``TOOL_CATEGORIES["tts"]``; this + function only injects PLUGIN-registered providers. + + Defensive: plugins whose name collides with a built-in TTS provider + are filtered out — even though the registry already rejects them + at registration time, a future code path that registers directly + via :func:`agent.tts_registry.register_provider` could slip + through. Filtering here keeps the picker invariant. + """ + try: + from agent.tts_registry import _BUILTIN_NAMES, list_providers + from hermes_cli.plugins import _ensure_plugins_discovered + + _ensure_plugins_discovered() + providers = list_providers() + except Exception: + return [] + + rows: list[dict] = [] + for provider in providers: + name = getattr(provider, "name", None) + if not name: + continue + # Defensive: reject built-in shadowing at the picker layer too. + if name.lower().strip() in _BUILTIN_NAMES: + continue + try: + schema = provider.get_setup_schema() + except Exception: + continue + if not isinstance(schema, dict): + continue + row = { + "name": schema.get("name", provider.display_name), + "badge": schema.get("badge", ""), + "tag": schema.get("tag", ""), + "env_vars": schema.get("env_vars", []), + # Selecting this row writes ``tts.provider: `` — the + # same write-path used by hardcoded rows. The plugin + # dispatcher picks it up automatically from there. + "tts_provider": name, + "tts_plugin_name": name, + } + if schema.get("post_setup"): + row["post_setup"] = schema["post_setup"] + rows.append(row) + return rows + + def _visible_providers(cat: dict, config: dict) -> list[dict]: """Return provider entries visible for the current auth/config state.""" features = get_nous_subscription_features(config) @@ -1790,6 +1846,12 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]: if cat.get("name") == "Browser Automation": visible.extend(_plugin_browser_providers()) + # Inject plugin-registered TTS backends (issue #30398). Plugin rows + # render BELOW the 10 hardcoded built-in rows. Built-in shadowing + # is filtered out by ``_plugin_tts_providers`` defensively. + if cat.get("name") == "Text-to-Speech": + visible.extend(_plugin_tts_providers()) + return visible diff --git a/tests/agent/test_tts_registry.py b/tests/agent/test_tts_registry.py new file mode 100644 index 00000000000..e3959e41a17 --- /dev/null +++ b/tests/agent/test_tts_registry.py @@ -0,0 +1,312 @@ +"""Tests for agent/tts_registry.py and agent/tts_provider.py. + +Covers: +- Registration happy path +- Registration rejection: non-TTSProvider type +- Registration rejection: empty/whitespace name +- Built-in name shadowing: warning + silent ignore (no exception) +- Re-registration: overwrites + logs at debug +- Case + whitespace insensitivity on lookup +- ABC contract: default implementations work +- ABC contract: synthesize() must be implemented +- ABC contract: stream() raises NotImplementedError by default +- resolve_output_format helper coerces invalid input +""" + +from __future__ import annotations + +import logging +from typing import Any, Optional + +import pytest + +from agent import tts_registry +from agent.tts_provider import ( + DEFAULT_OUTPUT_FORMAT, + VALID_OUTPUT_FORMATS, + TTSProvider, + resolve_output_format, +) + + +class _FakeProvider(TTSProvider): + def __init__( + self, + name: str = "fake", + display: Optional[str] = None, + voice_compat: bool = False, + synthesize_impl: Optional[Any] = None, + ): + self._name = name + self._display = display + self._voice_compat = voice_compat + self._synthesize_impl = synthesize_impl + + @property + def name(self) -> str: + return self._name + + @property + def display_name(self) -> str: + return self._display if self._display is not None else super().display_name + + @property + def voice_compatible(self) -> bool: + return self._voice_compat + + def synthesize(self, text: str, output_path: str, **kw): + if self._synthesize_impl is not None: + return self._synthesize_impl(text, output_path, **kw) + return output_path + + +@pytest.fixture(autouse=True) +def _reset_registry(): + tts_registry._reset_for_tests() + yield + tts_registry._reset_for_tests() + + +# --------------------------------------------------------------------------- +# Registration +# --------------------------------------------------------------------------- + + +class TestRegistration: + def test_happy_path(self): + p = _FakeProvider(name="cartesia") + tts_registry.register_provider(p) + assert tts_registry.get_provider("cartesia") is p + assert [r.name for r in tts_registry.list_providers()] == ["cartesia"] + + def test_rejects_non_provider_type(self): + with pytest.raises(TypeError, match="expects a TTSProvider instance"): + tts_registry.register_provider("not a provider") # type: ignore[arg-type] + assert tts_registry.list_providers() == [] + + def test_rejects_empty_name(self): + p = _FakeProvider(name="") + with pytest.raises(ValueError, match="non-empty string"): + tts_registry.register_provider(p) + assert tts_registry.list_providers() == [] + + def test_rejects_whitespace_name(self): + p = _FakeProvider(name=" ") + with pytest.raises(ValueError, match="non-empty string"): + tts_registry.register_provider(p) + assert tts_registry.list_providers() == [] + + @pytest.mark.parametrize( + "builtin", + ["edge", "openai", "elevenlabs", "minimax", "gemini", + "mistral", "xai", "piper", "kittentts", "neutts"], + ) + def test_rejects_builtin_shadow_with_warning(self, builtin, caplog): + """Built-in names always win — plugin registration is silently ignored + but a warning is logged so the operator can see what happened. + """ + p = _FakeProvider(name=builtin) + with caplog.at_level(logging.WARNING, logger="agent.tts_registry"): + tts_registry.register_provider(p) + assert "shadows a built-in name" in caplog.text + assert builtin in caplog.text + assert tts_registry.get_provider(builtin) is None + assert tts_registry.list_providers() == [] + + def test_builtin_shadow_case_insensitive(self, caplog): + """``EDGE``/``Edge``/`` edge `` all collide with the ``edge`` built-in.""" + for variant in ("EDGE", "Edge", " edge ", "eDgE"): + tts_registry._reset_for_tests() + with caplog.at_level(logging.WARNING, logger="agent.tts_registry"): + tts_registry.register_provider(_FakeProvider(name=variant)) + assert tts_registry.list_providers() == [], ( + f"variant {variant!r} should have been rejected as a built-in shadow" + ) + + def test_reregistration_overwrites(self, caplog): + p1 = _FakeProvider(name="cartesia") + p2 = _FakeProvider(name="cartesia") + tts_registry.register_provider(p1) + with caplog.at_level(logging.DEBUG, logger="agent.tts_registry"): + tts_registry.register_provider(p2) + assert tts_registry.get_provider("cartesia") is p2 + assert "re-registered" in caplog.text + + +# --------------------------------------------------------------------------- +# Lookup +# --------------------------------------------------------------------------- + + +class TestLookup: + def test_get_provider_missing_returns_none(self): + assert tts_registry.get_provider("nonexistent") is None + + def test_get_provider_non_string_returns_none(self): + assert tts_registry.get_provider(None) is None # type: ignore[arg-type] + assert tts_registry.get_provider(123) is None # type: ignore[arg-type] + + def test_get_provider_case_insensitive(self): + p = _FakeProvider(name="cartesia") + tts_registry.register_provider(p) + assert tts_registry.get_provider("CARTESIA") is p + assert tts_registry.get_provider("Cartesia") is p + + def test_get_provider_whitespace_tolerant(self): + p = _FakeProvider(name="cartesia") + tts_registry.register_provider(p) + assert tts_registry.get_provider(" cartesia ") is p + + def test_list_providers_sorted(self): + tts_registry.register_provider(_FakeProvider(name="zylo")) + tts_registry.register_provider(_FakeProvider(name="alpha")) + tts_registry.register_provider(_FakeProvider(name="middle")) + names = [p.name for p in tts_registry.list_providers()] + assert names == ["alpha", "middle", "zylo"] + + +# --------------------------------------------------------------------------- +# ABC contract +# --------------------------------------------------------------------------- + + +class TestABCContract: + def test_must_implement_synthesize(self): + class Incomplete(TTSProvider): + @property + def name(self) -> str: + return "incomplete" + # synthesize NOT implemented + + with pytest.raises(TypeError, match="abstract"): + Incomplete() # type: ignore[abstract] + + def test_must_implement_name(self): + class Incomplete(TTSProvider): + def synthesize(self, text, output_path, **kw): + return output_path + # name NOT implemented + + with pytest.raises(TypeError, match="abstract"): + Incomplete() # type: ignore[abstract] + + def test_display_name_defaults_to_title(self): + p = _FakeProvider(name="cartesia") + assert p.display_name == "Cartesia" + + def test_display_name_override_respected(self): + p = _FakeProvider(name="cartesia", display="Cartesia AI") + assert p.display_name == "Cartesia AI" + + def test_is_available_default_true(self): + p = _FakeProvider(name="cartesia") + assert p.is_available() is True + + def test_list_voices_default_empty(self): + p = _FakeProvider(name="cartesia") + assert p.list_voices() == [] + + def test_list_models_default_empty(self): + p = _FakeProvider(name="cartesia") + assert p.list_models() == [] + + def test_default_model_none_when_no_models(self): + p = _FakeProvider(name="cartesia") + assert p.default_model() is None + + def test_default_voice_none_when_no_voices(self): + p = _FakeProvider(name="cartesia") + assert p.default_voice() is None + + def test_default_model_first_listed(self): + class WithModels(_FakeProvider): + def list_models(self): + return [{"id": "sonic-2"}, {"id": "sonic-1"}] + + p = WithModels(name="cartesia") + assert p.default_model() == "sonic-2" + + def test_default_voice_first_listed(self): + class WithVoices(_FakeProvider): + def list_voices(self): + return [{"id": "voice-aria"}, {"id": "voice-jasper"}] + + p = WithVoices(name="cartesia") + assert p.default_voice() == "voice-aria" + + def test_get_setup_schema_default_minimal(self): + p = _FakeProvider(name="cartesia") + schema = p.get_setup_schema() + assert schema["name"] == "Cartesia" + assert schema["env_vars"] == [] + + def test_stream_raises_not_implemented_by_default(self): + p = _FakeProvider(name="cartesia") + with pytest.raises(NotImplementedError, match="does not implement streaming"): + next(p.stream("hello")) + + def test_voice_compatible_default_false(self): + p = _FakeProvider(name="cartesia") + assert p.voice_compatible is False + + def test_voice_compatible_override(self): + p = _FakeProvider(name="cartesia", voice_compat=True) + assert p.voice_compatible is True + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +class TestResolveOutputFormat: + @pytest.mark.parametrize("valid", sorted(VALID_OUTPUT_FORMATS)) + def test_valid_passes_through(self, valid): + assert resolve_output_format(valid) == valid + + def test_uppercase_normalized(self): + assert resolve_output_format("MP3") == "mp3" + assert resolve_output_format("Opus") == "opus" + + def test_whitespace_stripped(self): + assert resolve_output_format(" wav ") == "wav" + + def test_invalid_returns_default(self): + assert resolve_output_format("aiff") == DEFAULT_OUTPUT_FORMAT + assert resolve_output_format("") == DEFAULT_OUTPUT_FORMAT + + def test_none_returns_default(self): + assert resolve_output_format(None) == DEFAULT_OUTPUT_FORMAT + + def test_non_string_returns_default(self): + assert resolve_output_format(123) == DEFAULT_OUTPUT_FORMAT # type: ignore[arg-type] + assert resolve_output_format([]) == DEFAULT_OUTPUT_FORMAT # type: ignore[arg-type] + + +# --------------------------------------------------------------------------- +# Sync invariant: registry's built-in list vs dispatcher's built-in list +# --------------------------------------------------------------------------- + + +class TestBuiltinSync: + """``_BUILTIN_NAMES`` in agent/tts_registry.py is duplicated from + ``BUILTIN_TTS_PROVIDERS`` in tools/tts_tool.py (importing directly + would create a circular dependency). This test fails loudly if the + two lists drift — a new built-in added to tts_tool.py MUST also be + added to tts_registry.py's _BUILTIN_NAMES or the registry will + accept a name the dispatcher will silently route to the wrong + handler. + """ + + def test_registry_builtins_match_dispatcher_builtins(self): + from tools.tts_tool import BUILTIN_TTS_PROVIDERS + + assert tts_registry._BUILTIN_NAMES == BUILTIN_TTS_PROVIDERS, ( + "agent.tts_registry._BUILTIN_NAMES and " + "tools.tts_tool.BUILTIN_TTS_PROVIDERS have drifted!\n" + f" Registry only: {sorted(tts_registry._BUILTIN_NAMES - BUILTIN_TTS_PROVIDERS)}\n" + f" Dispatcher only: {sorted(BUILTIN_TTS_PROVIDERS - tts_registry._BUILTIN_NAMES)}\n" + "Add the missing names to whichever list is incomplete. " + "These two lists exist as a circular-import workaround and " + "MUST be kept in sync manually." + ) diff --git a/tests/hermes_cli/test_plugins_tts_registration.py b/tests/hermes_cli/test_plugins_tts_registration.py new file mode 100644 index 00000000000..81a6b6a0bd8 --- /dev/null +++ b/tests/hermes_cli/test_plugins_tts_registration.py @@ -0,0 +1,156 @@ +"""Tests for PluginContext.register_tts_provider() (issue #30398). + +Exercises the plugin context hook end-to-end: drops a fake plugin into +``$HERMES_HOME/plugins/``, runs ``PluginManager().discover_and_load()``, +and asserts the registration result. + +Mirrors the structure of +``tests/hermes_cli/test_plugin_scanner_recursion.py::TestRegisterImageGenProvider``. +""" + +from __future__ import annotations + +import os +from pathlib import Path +from typing import Any, Dict + +import yaml + + +def _write_plugin( + root: Path, + name: str, + *, + manifest_extra: Dict[str, Any] | None = None, + register_body: str = "pass", +) -> Path: + plugin_dir = root / name + plugin_dir.mkdir(parents=True, exist_ok=True) + manifest = { + "name": name, + "version": "0.1.0", + "description": f"Test plugin {name}", + } + if manifest_extra: + manifest.update(manifest_extra) + (plugin_dir / "plugin.yaml").write_text(yaml.dump(manifest)) + (plugin_dir / "__init__.py").write_text( + f"def register(ctx):\n {register_body}\n" + ) + return plugin_dir + + +def _enable(hermes_home: Path, name: str) -> None: + cfg_path = hermes_home / "config.yaml" + cfg: dict = {} + if cfg_path.exists(): + try: + cfg = yaml.safe_load(cfg_path.read_text()) or {} + except Exception: + cfg = {} + plugins_cfg = cfg.setdefault("plugins", {}) + enabled = plugins_cfg.setdefault("enabled", []) + if isinstance(enabled, list) and name not in enabled: + enabled.append(name) + cfg_path.write_text(yaml.safe_dump(cfg)) + + +class TestRegisterTTSProvider: + """End-to-end: a fake plugin registers via the hook, ends up in the registry.""" + + def test_accepts_valid_provider(self): + from hermes_cli.plugins import PluginManager + + from agent import tts_registry + tts_registry._reset_for_tests() + + hermes_home = Path(os.environ["HERMES_HOME"]) + _write_plugin( + hermes_home / "plugins", + "my-tts-plugin", + register_body=( + "from agent.tts_provider import TTSProvider\n" + " class P(TTSProvider):\n" + " @property\n" + " def name(self): return 'fake-tts'\n" + " def synthesize(self, text, output_path, **kw):\n" + " return output_path\n" + " ctx.register_tts_provider(P())" + ), + ) + _enable(hermes_home, "my-tts-plugin") + + mgr = PluginManager() + mgr.discover_and_load() + + assert mgr._plugins["my-tts-plugin"].enabled is True, ( + f"Plugin failed to load: {mgr._plugins['my-tts-plugin'].error}" + ) + assert tts_registry.get_provider("fake-tts") is not None + + tts_registry._reset_for_tests() + + def test_rejects_non_provider(self, caplog): + """A plugin that passes a non-TTSProvider gets a warning, no exception.""" + from hermes_cli.plugins import PluginManager + + from agent import tts_registry + tts_registry._reset_for_tests() + + hermes_home = Path(os.environ["HERMES_HOME"]) + _write_plugin( + hermes_home / "plugins", + "bad-tts-plugin", + register_body="ctx.register_tts_provider('not a provider')", + ) + _enable(hermes_home, "bad-tts-plugin") + + with caplog.at_level("WARNING"): + mgr = PluginManager() + mgr.discover_and_load() + + # Plugin loaded (register returned normally), but registry empty. + assert mgr._plugins["bad-tts-plugin"].enabled is True + assert tts_registry.get_provider("not a provider") is None + assert tts_registry.list_providers() == [] + assert "does not inherit from TTSProvider" in caplog.text + + tts_registry._reset_for_tests() + + def test_rejects_builtin_shadow(self, caplog): + """A plugin trying to register a name colliding with a built-in is silently + rejected by the underlying registry — both with a registry-level warning + AND with the registry remaining empty (plugin still loads OK). + """ + from hermes_cli.plugins import PluginManager + + from agent import tts_registry + tts_registry._reset_for_tests() + + hermes_home = Path(os.environ["HERMES_HOME"]) + _write_plugin( + hermes_home / "plugins", + "shadow-tts-plugin", + register_body=( + "from agent.tts_provider import TTSProvider\n" + " class P(TTSProvider):\n" + " @property\n" + " def name(self): return 'edge'\n" + " def synthesize(self, text, output_path, **kw):\n" + " return output_path\n" + " ctx.register_tts_provider(P())" + ), + ) + _enable(hermes_home, "shadow-tts-plugin") + + with caplog.at_level("WARNING"): + mgr = PluginManager() + mgr.discover_and_load() + + # Plugin still loaded normally — built-in shadowing is a warning, + # not an exception. The registry rejects the entry though. + assert mgr._plugins["shadow-tts-plugin"].enabled is True + assert tts_registry.get_provider("edge") is None + assert "shadows a built-in name" in caplog.text + + tts_registry._reset_for_tests() diff --git a/tests/hermes_cli/test_tts_picker.py b/tests/hermes_cli/test_tts_picker.py new file mode 100644 index 00000000000..53751c7edc3 --- /dev/null +++ b/tests/hermes_cli/test_tts_picker.py @@ -0,0 +1,187 @@ +"""Tests for the TTS plugin picker surface in hermes_cli/tools_config.py (issue #30398). + +Covers ``_plugin_tts_providers()`` and the ``_visible_providers()`` +integration that injects plugin rows into the Text-to-Speech category. + +Mirrors the structure of existing image_gen / browser picker tests. +""" + +from __future__ import annotations + +import pytest + +from agent import tts_registry +from agent.tts_provider import TTSProvider +from hermes_cli import tools_config + + +class _FakeTTSProvider(TTSProvider): + def __init__(self, name: str, schema: dict | None = None): + self._name = name + self._schema = schema + + @property + def name(self) -> str: + return self._name + + def synthesize(self, text, output_path, **kw): + return output_path + + def get_setup_schema(self): + if self._schema is not None: + return self._schema + return super().get_setup_schema() + + +@pytest.fixture(autouse=True) +def _reset_registry(): + tts_registry._reset_for_tests() + yield + tts_registry._reset_for_tests() + + +class TestPluginTTSProviders: + """``_plugin_tts_providers()`` returns picker-row dicts.""" + + def test_empty_when_no_plugins(self): + assert tools_config._plugin_tts_providers() == [] + + def test_returns_row_for_registered_plugin(self): + tts_registry.register_provider( + _FakeTTSProvider( + name="cartesia", + schema={ + "name": "Cartesia", + "badge": "paid", + "tag": "Ultra-low-latency streaming", + "env_vars": [ + {"key": "CARTESIA_API_KEY", "prompt": "Cartesia API key", + "url": "https://play.cartesia.ai/console"}, + ], + }, + ) + ) + rows = tools_config._plugin_tts_providers() + assert len(rows) == 1 + row = rows[0] + assert row["name"] == "Cartesia" + assert row["badge"] == "paid" + assert row["tag"] == "Ultra-low-latency streaming" + assert row["env_vars"][0]["key"] == "CARTESIA_API_KEY" + # Selecting this row writes ``tts.provider: cartesia`` — same + # write path as a hardcoded row. + assert row["tts_provider"] == "cartesia" + assert row["tts_plugin_name"] == "cartesia" + + def test_filters_builtin_shadow_defensively(self): + """Even if a plugin slipped past the registry's built-in check + (e.g. via direct ``agent.tts_registry.register_provider`` rather + than the ``ctx.register_tts_provider`` hook), the picker layer + filters it out so the picker invariant holds.""" + # Use lower-level call to bypass the warning + skip in + # register_provider (the registry's built-in guard). + # Note: this is intentionally pathological — production code + # paths go through the hook which catches this first. + provider = _FakeTTSProvider(name="edge") + tts_registry._providers["edge"] = provider # type: ignore[index] + try: + rows = tools_config._plugin_tts_providers() + assert rows == [], ( + "Picker must filter built-in name shadows even when the " + "registry has been bypassed." + ) + finally: + tts_registry._providers.pop("edge", None) # type: ignore[arg-type] + + def test_skips_providers_with_no_name(self): + """Defense in depth: a provider with no .name attribute is skipped + rather than crashing the picker.""" + + class _NoName: + display_name = "Bogus" + def get_setup_schema(self): + return {"name": "Bogus"} + + tts_registry._providers["bogus"] = _NoName() # type: ignore[assignment] + try: + rows = tools_config._plugin_tts_providers() + # Provider has no .name so the picker filters it out + assert all(r.get("tts_plugin_name") != "bogus" for r in rows) + finally: + tts_registry._providers.pop("bogus", None) # type: ignore[arg-type] + + def test_skips_providers_whose_schema_raises(self): + class _ExplodingSchema(_FakeTTSProvider): + def get_setup_schema(self): + raise RuntimeError("boom") + + tts_registry.register_provider(_ExplodingSchema(name="exploding")) + tts_registry.register_provider(_FakeTTSProvider(name="working")) + rows = tools_config._plugin_tts_providers() + assert [r["tts_plugin_name"] for r in rows] == ["working"] + + def test_minimal_schema_uses_display_name(self): + """A provider with no setup_schema override gets a row built from + ``display_name`` and ``name`` only.""" + tts_registry.register_provider(_FakeTTSProvider(name="minimal")) + rows = tools_config._plugin_tts_providers() + assert len(rows) == 1 + assert rows[0]["name"] == "Minimal" # display_name default + assert rows[0]["tts_provider"] == "minimal" + assert rows[0]["env_vars"] == [] + + def test_post_setup_passthrough(self): + tts_registry.register_provider( + _FakeTTSProvider( + name="my-tts", + schema={ + "name": "My TTS", + "post_setup": "my_post_install_hook", + "env_vars": [], + }, + ) + ) + rows = tools_config._plugin_tts_providers() + assert rows[0].get("post_setup") == "my_post_install_hook" + + +class TestVisibleProvidersInjectsTTSPlugins: + """``_visible_providers()`` injects plugin rows into the Text-to-Speech + category alongside the hardcoded built-in rows.""" + + def test_tts_category_includes_plugin_rows(self): + tts_registry.register_provider(_FakeTTSProvider(name="cartesia")) + + tts_cat = tools_config.TOOL_CATEGORIES["tts"] + visible = tools_config._visible_providers(tts_cat, config={}) + + names = [row.get("name") for row in visible] + # Hardcoded rows (sample — check at least one is present) + assert "Microsoft Edge TTS" in names + # Plugin row injected at the end + assert "Cartesia" in names + + # Plugin row has tts_provider key for write-path compat + plugin_rows = [r for r in visible if r.get("tts_plugin_name")] + assert len(plugin_rows) == 1 + assert plugin_rows[0]["tts_provider"] == "cartesia" + + def test_other_categories_unaffected_by_tts_plugins(self): + """Registering a TTS plugin must not leak into the Image Generation + or Browser pickers.""" + tts_registry.register_provider(_FakeTTSProvider(name="cartesia")) + + img_cat = tools_config.TOOL_CATEGORIES["image_gen"] + visible = tools_config._visible_providers(img_cat, config={}) + names = [row.get("name") for row in visible] + assert "Cartesia" not in names + + def test_tts_category_without_plugins_only_hardcoded(self): + """No plugins → picker shows exactly the hardcoded rows.""" + tts_cat = tools_config.TOOL_CATEGORIES["tts"] + visible = tools_config._visible_providers(tts_cat, config={}) + names = [row.get("name") for row in visible] + # No row has the plugin marker + assert all(not row.get("tts_plugin_name") for row in visible) + # Hardcoded rows still present (sample one of the always-visible ones) + assert "Microsoft Edge TTS" in names diff --git a/tests/plugins/tts/__init__.py b/tests/plugins/tts/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/plugins/tts/check_parity_vs_main.py b/tests/plugins/tts/check_parity_vs_main.py new file mode 100644 index 00000000000..b3dcf87cecc --- /dev/null +++ b/tests/plugins/tts/check_parity_vs_main.py @@ -0,0 +1,328 @@ +"""Behavior-parity check for the TTS plugin hook (issue #30398). + +Spawns one subprocess per (version, scenario) cell — pinned to either +``origin/main`` (no plugin hook; ``tts.provider: cartesia`` falls +through to the Edge TTS default branch) or this PR's worktree (plugin +hook present; same config routes through the plugin registry when a +plugin is registered). + +Each subprocess clears all TTS-related env vars + writes a +``config.yaml``, then resolves how the dispatcher would route a +``text_to_speech`` call. The emitted shape tuple is:: + + {dispatch_kind, provider_name, voice_compat} + +Where ``dispatch_kind`` ∈ +``{"builtin_edge", "builtin_openai", "builtin_elevenlabs", ..., +"command", "plugin", "fallback_edge", "error"}``: + +* ``builtin_`` — config selects a built-in handler that exists + on both main and PR (no diff expected) +* ``command`` — config selects a ``tts.providers.: type: command`` + entry (PR #17843; no diff expected) +* ``plugin`` — config selects a plugin-registered provider (PR only) +* ``fallback_edge`` — config selects an unknown name with no matching + plugin or command entry → Edge TTS default fallback +* ``error`` — explicit fatal error (e.g. mistral quarantine) + +The parent process diffs the reduced shape per scenario. The only +acceptable diff is ``fallback_edge → plugin`` for the +``unknown-name-with-plugin-installed`` scenario — everything else is +a regression. + +Run from the PR worktree (it auto-resolves ``MAIN_DIR`` from the parent +of the worktree directory, or falls back to a sibling +``hermes-agent-main`` checkout):: + + python tests/plugins/tts/check_parity_vs_main.py +""" + +from __future__ import annotations + +import json +import subprocess +import sys +from pathlib import Path + + +REPO_ROOT = Path(__file__).resolve().parents[3] + + +def _resolve_main_dir() -> Path: + candidate = REPO_ROOT.parent.parent + if (candidate / "tools" / "tts_tool.py").exists() and candidate != REPO_ROOT: + return candidate + sibling = REPO_ROOT.parent / "hermes-agent-main" + if (sibling / "tools" / "tts_tool.py").exists(): + return sibling + return REPO_ROOT + + +MAIN_DIR = _resolve_main_dir() +PR_DIR = REPO_ROOT +assert (PR_DIR / "tools" / "tts_tool.py").exists(), ( + f"PR_DIR={PR_DIR} doesn't look like a hermes-agent checkout" +) + + +# The subprocess script — runs INSIDE either the main checkout or PR +# checkout, so the import paths resolve to the version of the code +# under test. We never call the real ``text_to_speech_tool`` because +# that would require audio synthesis; instead we ask the resolution +# layer what it WOULD do. +SUBPROCESS_SCRIPT = r""" +import json, os, sys, tempfile +sys.path.insert(0, sys.argv[1]) + +# Isolated HERMES_HOME so the config write is hermetic. +home = tempfile.mkdtemp() +os.environ["HERMES_HOME"] = home + +# Clear TTS-related env so dispatch decisions are config-driven. +for k in ( + "ELEVENLABS_API_KEY", "OPENAI_API_KEY", "VOICE_TOOLS_OPENAI_KEY", + "MINIMAX_API_KEY", "XAI_API_KEY", "GEMINI_API_KEY", +): + os.environ.pop(k, None) + +scenario_env = json.loads(sys.argv[2]) +os.environ.update(scenario_env) + +config_yaml = sys.argv[3] +plugin_register = sys.argv[4] # "yes" to register a fake plugin + +config_path = os.path.join(home, "config.yaml") +with open(config_path, "w") as f: + f.write(config_yaml) + +# Fresh import — must not have anything cached from prior runs. +for name in list(sys.modules): + if (name.startswith("tools.") + or name.startswith("agent.") + or name.startswith("plugins.") + or name.startswith("hermes_cli.")): + sys.modules.pop(name, None) + +# Try importing tts_registry — only exists on PR side. +have_plugin_hook = False +try: + from agent import tts_registry + from agent.tts_provider import TTSProvider + have_plugin_hook = True + + if plugin_register == "yes": + class _FakeProvider(TTSProvider): + @property + def name(self): return "cartesia" + def synthesize(self, text, output_path, **kw): + return output_path + + tts_registry._reset_for_tests() + tts_registry.register_provider(_FakeProvider()) +except ImportError: + pass + +import tools.tts_tool as tts_tool + +# Read the config the same way text_to_speech_tool() does. +tts_config = tts_tool._load_tts_config() +provider = tts_tool._get_provider(tts_config) + +dispatch_kind = None +provider_name = provider +voice_compat = False +error_text = None + +try: + # Mistral is the one branch that returns a fatal error. + if provider == "mistral": + dispatch_kind = "error" + error_text = "mistral quarantine" + elif tts_tool._resolve_command_provider_config(provider, tts_config) is not None: + dispatch_kind = "command" + elif have_plugin_hook and provider not in tts_tool.BUILTIN_TTS_PROVIDERS: + # On PR side: check plugin dispatch. + plugin_path = tts_tool._dispatch_to_plugin_provider( + "test", os.path.join(home, "out.mp3"), provider, tts_config, + ) + if plugin_path is not None: + dispatch_kind = "plugin" + voice_compat = tts_tool._plugin_provider_is_voice_compatible(provider) + else: + # Falls through to Edge TTS default on the PR side too. + dispatch_kind = "fallback_edge" + elif provider in tts_tool.BUILTIN_TTS_PROVIDERS: + dispatch_kind = "builtin_" + provider + else: + # On main side: unknown names fall through to Edge default. + dispatch_kind = "fallback_edge" +except Exception as exc: + dispatch_kind = "exception" + error_text = repr(exc) + +shape = { + "dispatch_kind": dispatch_kind, + "provider_name": provider_name, + "voice_compat": bool(voice_compat), + "error_present": error_text is not None, +} +print(json.dumps(shape)) +""" + + +SCENARIOS: list[tuple[str, str, dict[str, str], str]] = [ + # (label, config.yaml body, scenario_env, plugin_register) + + # Scenario 1: unset tts.provider → both: Edge default + ("unset-defaults-to-edge", "", {}, "no"), + + # Scenario 2: built-in name → both: that built-in + ("explicit-edge", "tts:\n provider: edge\n", {}, "no"), + ("explicit-openai", "tts:\n provider: openai\n", {}, "no"), + ("explicit-elevenlabs", "tts:\n provider: elevenlabs\n", {}, "no"), + + # Scenario 3: command-type provider → both: command dispatch + ( + "command-provider", + "tts:\n provider: my-piper\n providers:\n my-piper:\n type: command\n command: 'piper -m model.onnx -f {output_path} < {input_path}'\n", + {}, + "no", + ), + + # Scenario 4: unknown name with NO plugin installed → both: fallback to Edge + ("unknown-no-plugin", "tts:\n provider: cartesia\n", {}, "no"), + + # Scenario 5: unknown name WITH plugin installed + # main: fallback_edge (no plugin hook exists) + # PR: plugin (cartesia) + # This is the ONLY acceptable diff in the harness. + ("plugin-installed", "tts:\n provider: cartesia\n", {}, "yes"), + + # Scenario 6: built-in name + plugin tries to shadow → both: built-in + # The plugin registers under name "cartesia", not "edge", so this is + # effectively the same as scenario 2 — but we exercise the with-plugin + # path to ensure the built-in branch still takes priority. + ("explicit-edge-with-plugin-registered", "tts:\n provider: edge\n", {}, "yes"), + + # Scenario 7: mistral quarantine — both surface the explicit error + ("mistral-quarantine", "tts:\n provider: mistral\n", {}, "no"), +] + + +def _run_scenario(repo_path: Path, label: str, config_yaml: str, env: dict, plugin_register: str) -> dict: + venv_python = repo_path / ".venv" / "bin" / "python" + if not venv_python.exists(): + venv_python = MAIN_DIR / ".venv" / "bin" / "python" + if not venv_python.exists(): + venv_python = MAIN_DIR / "venv" / "bin" / "python" + if not venv_python.exists(): + venv_python = Path("python3") + + out = subprocess.run( + [ + str(venv_python), + "-c", + SUBPROCESS_SCRIPT, + str(repo_path), + json.dumps(env), + config_yaml, + plugin_register, + ], + capture_output=True, + text=True, + timeout=60, + ) + if out.returncode != 0: + return { + "error": "subprocess failed", + "stdout": out.stdout[-500:], + "stderr": out.stderr[-500:], + } + try: + return json.loads(out.stdout.strip().splitlines()[-1]) + except Exception as exc: + return {"error": f"could not parse output: {exc}", "stdout": out.stdout} + + +def _reduce(shape: dict) -> dict: + """Reduce to the parts that matter for user-visible parity.""" + return { + "dispatch_kind": shape.get("dispatch_kind"), + "provider_name": shape.get("provider_name"), + "error_present": shape.get("error_present"), + } + + +def main() -> int: + print(f"main: {MAIN_DIR}") + print(f"pr: {PR_DIR}") + print() + + if MAIN_DIR == PR_DIR: + print( + "WARN: MAIN_DIR == PR_DIR — diffs will be trivially identical.\n" + " Set up a sibling 'hermes-agent-main' checkout pinned to " + "origin/main to get real parity coverage." + ) + print() + + failures: list[str] = [] + errors: list[str] = [] + intentional_diffs: list[tuple[str, dict, dict]] = [] + for label, config_yaml, env, plugin_register in SCENARIOS: + main_shape = _run_scenario(MAIN_DIR, label, config_yaml, env, plugin_register) + pr_shape = _run_scenario(PR_DIR, label, config_yaml, env, plugin_register) + + if "error" in main_shape or "error" in pr_shape: + print(f" [ERR ] {label}: subprocess failed") + print(f" main: {main_shape}") + print(f" pr: {pr_shape}") + errors.append(label) + continue + + main_reduced = _reduce(main_shape) + pr_reduced = _reduce(pr_shape) + + if main_reduced == pr_reduced: + print(f" [OK] {label}: {main_reduced}") + continue + + # On main, "plugin-installed" scenario returns fallback_edge + # (no plugin hook); on PR, it routes to the plugin. That's the + # only acceptable diff. + fallback_to_plugin = ( + main_reduced.get("dispatch_kind") == "fallback_edge" + and pr_reduced.get("dispatch_kind") == "plugin" + and label == "plugin-installed" + ) + if fallback_to_plugin: + print(f" [DIFF] {label}: fallback_edge → plugin — expected") + intentional_diffs.append((label, main_reduced, pr_reduced)) + else: + print(f" [FAIL] {label}") + print(f" main: {main_reduced}") + print(f" pr: {pr_reduced}") + failures.append(label) + + print() + if errors: + print(f"SUBPROCESS ERRORS in {len(errors)} scenario(s):") + for e in errors: + print(f" - {e}") + if failures: + print(f"BEHAVIOUR REGRESSION in {len(failures)} scenario(s):") + for f in failures: + print(f" - {f}") + if intentional_diffs: + print( + f"INTENTIONAL DIFFS ({len(intentional_diffs)}): " + f"fallback_edge → plugin dispatch when a plugin is registered." + ) + if failures or errors: + return 1 + print(f"PARITY OK across {len(SCENARIOS)} scenarios.") + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/tests/tools/test_tts_plugin_dispatch.py b/tests/tools/test_tts_plugin_dispatch.py new file mode 100644 index 00000000000..d8ead912e71 --- /dev/null +++ b/tests/tools/test_tts_plugin_dispatch.py @@ -0,0 +1,323 @@ +"""Tests for TTS plugin dispatch in tools/tts_tool.py (issue #30398). + +Covers the three core invariants of the plugin dispatcher: + +1. Built-in provider names short-circuit — plugins NEVER win over a + built-in. Even if a plugin somehow ended up in the registry with a + built-in name (which the registry already blocks), the dispatcher + re-checks defensively. +2. Command-type providers declared under ``tts.providers.: type: + command`` (PR #17843) win over a plugin with the same name. Config + is more local than plugin install. +3. Plugin dispatch fires only when the configured provider is neither + a built-in nor a command-type entry, AND a plugin is registered + under that name. Unknown names fall through. + +Also exercises: +- Plugin exceptions surface to the outer error envelope (don't crash) +- Plugin returning a different path is honored +- voice_compatible: True triggers ffmpeg opus conversion path +- voice_compatible: False keeps the file as-is + +The dispatcher is exercised in isolation — we don't actually call +``text_to_speech_tool`` because that would require real audio file +writes. Each test directly calls +``tools.tts_tool._dispatch_to_plugin_provider`` / the predicate +helpers. +""" + +from __future__ import annotations + +from typing import Optional + +import pytest + +from agent import tts_registry +from agent.tts_provider import TTSProvider +from tools import tts_tool + + +class _FakeTTSProvider(TTSProvider): + def __init__( + self, + name: str, + voice_compat: bool = False, + raise_exc: Optional[BaseException] = None, + return_path: Optional[str] = None, + ): + self._name = name + self._voice_compat = voice_compat + self._raise_exc = raise_exc + self._return_path = return_path + # Recorded for assertions + self.last_call: Optional[dict] = None + + @property + def name(self) -> str: + return self._name + + @property + def voice_compatible(self) -> bool: + return self._voice_compat + + def synthesize(self, text, output_path, **kw): + self.last_call = { + "text": text, + "output_path": output_path, + "kwargs": dict(kw), + } + if self._raise_exc is not None: + raise self._raise_exc + return self._return_path if self._return_path is not None else output_path + + +@pytest.fixture(autouse=True) +def _reset_registry(): + tts_registry._reset_for_tests() + yield + tts_registry._reset_for_tests() + + +# --------------------------------------------------------------------------- +# Resolution invariants +# --------------------------------------------------------------------------- + + +class TestBuiltinAlwaysWins: + """Built-in TTS provider names short-circuit the dispatcher. + + Even with a plugin registered (which the registry would reject — + but the dispatcher is defensive), built-in names return None so + the caller's elif chain handles them natively. + """ + + @pytest.mark.parametrize( + "builtin", + ["edge", "openai", "elevenlabs", "minimax", "gemini", + "mistral", "xai", "piper", "kittentts", "neutts"], + ) + def test_dispatcher_short_circuits_builtin(self, builtin): + result = tts_tool._dispatch_to_plugin_provider( + text="hello", + output_path="/tmp/out.mp3", + provider=builtin, + tts_config={}, + ) + assert result is None, ( + f"Built-in {builtin!r} must short-circuit plugin dispatch. " + "If this test fails, the dispatcher would silently let a " + "plugin with a built-in name shadow the native handler — " + "violating the precedence rule from PR #17843." + ) + + def test_dispatcher_short_circuits_builtin_case_insensitive(self): + for variant in ("EDGE", "Edge", " edge ", "eDgE"): + assert ( + tts_tool._dispatch_to_plugin_provider( + text="hello", output_path="/tmp/x.mp3", + provider=variant, tts_config={}, + ) is None + ) + + +class TestCommandProviderWins: + """A same-name ``tts.providers.: type: command`` config beats a plugin. + + Locality: a user's command-provider config is more specific than + whichever plugin happens to be installed. + """ + + def test_command_config_beats_plugin(self): + tts_registry.register_provider(_FakeTTSProvider(name="my-tts")) + + result = tts_tool._dispatch_to_plugin_provider( + text="hello", + output_path="/tmp/out.mp3", + provider="my-tts", + tts_config={ + "providers": { + "my-tts": { + "type": "command", + "command": "echo 'hi' > {output_path}", + }, + }, + }, + ) + # Plugin path returns None → caller falls back to command + # provider dispatch (handled by the outer text_to_speech_tool + # via _resolve_command_provider_config). + assert result is None + + +class TestPluginDispatch: + """Happy path: configured name matches a registered plugin, dispatcher fires.""" + + def test_registered_plugin_called(self): + provider = _FakeTTSProvider(name="cartesia") + tts_registry.register_provider(provider) + + result = tts_tool._dispatch_to_plugin_provider( + text="hello world", + output_path="/tmp/out.mp3", + provider="cartesia", + tts_config={}, + ) + assert result == "/tmp/out.mp3" + assert provider.last_call is not None + assert provider.last_call["text"] == "hello world" + assert provider.last_call["output_path"] == "/tmp/out.mp3" + + def test_unregistered_name_returns_none(self): + result = tts_tool._dispatch_to_plugin_provider( + text="hello", + output_path="/tmp/out.mp3", + provider="unknown-tts", + tts_config={}, + ) + assert result is None + + def test_voice_model_speed_format_forwarded(self): + provider = _FakeTTSProvider(name="cartesia") + tts_registry.register_provider(provider) + + result = tts_tool._dispatch_to_plugin_provider( + text="hello", + output_path="/tmp/out.opus", + provider="cartesia", + tts_config={ + "voice": "voice-aria", + "model": "sonic-2", + "speed": 1.2, + "output_format": "opus", + }, + ) + assert result == "/tmp/out.opus" + kwargs = provider.last_call["kwargs"] + assert kwargs["voice"] == "voice-aria" + assert kwargs["model"] == "sonic-2" + assert kwargs["speed"] == 1.2 + assert kwargs["format"] == "opus" + + def test_empty_string_voice_passed_as_none(self): + """Empty-string config values are normalized to None so providers can + fall back to their own defaults (matches the ABC contract).""" + provider = _FakeTTSProvider(name="cartesia") + tts_registry.register_provider(provider) + + tts_tool._dispatch_to_plugin_provider( + text="hello", + output_path="/tmp/out.mp3", + provider="cartesia", + tts_config={"voice": "", "model": ""}, + ) + kwargs = provider.last_call["kwargs"] + assert kwargs["voice"] is None + assert kwargs["model"] is None + + def test_provider_returning_different_path_honored(self): + """If a provider rewrites the output path (e.g. format-driven extension + change), the dispatcher returns the new path.""" + provider = _FakeTTSProvider(name="cartesia", return_path="/tmp/rewritten.opus") + tts_registry.register_provider(provider) + + result = tts_tool._dispatch_to_plugin_provider( + text="hi", + output_path="/tmp/out.mp3", + provider="cartesia", + tts_config={}, + ) + assert result == "/tmp/rewritten.opus" + + def test_provider_returning_none_falls_back_to_output_path(self): + """Defensive: a provider returning None means the dispatcher should + report the caller-supplied output_path (matches the ABC contract — the + provider is supposed to write to output_path).""" + provider = _FakeTTSProvider(name="cartesia", return_path=None) + # Override the default-output-path behavior to return None explicitly + provider._return_path = None + + class _ReturnsNone(_FakeTTSProvider): + def synthesize(self, text, output_path, **kw): + return None # type: ignore[return-value] + + provider2 = _ReturnsNone(name="weird") + tts_registry.register_provider(provider2) + + result = tts_tool._dispatch_to_plugin_provider( + text="hi", + output_path="/tmp/out.mp3", + provider="weird", + tts_config={}, + ) + assert result == "/tmp/out.mp3" + + def test_provider_exception_bubbles_up(self): + """Plugin exceptions are NOT swallowed by the dispatcher — they bubble + up so the outer ``text_to_speech_tool`` try/except converts them to + the standard error envelope. Matches command-provider failure + behavior.""" + provider = _FakeTTSProvider( + name="cartesia", + raise_exc=RuntimeError("network down"), + ) + tts_registry.register_provider(provider) + + with pytest.raises(RuntimeError, match="network down"): + tts_tool._dispatch_to_plugin_provider( + text="hi", + output_path="/tmp/out.mp3", + provider="cartesia", + tts_config={}, + ) + + +# --------------------------------------------------------------------------- +# voice_compatible flag +# --------------------------------------------------------------------------- + + +class TestVoiceCompatibleHelper: + def test_voice_compatible_true(self): + tts_registry.register_provider( + _FakeTTSProvider(name="cartesia", voice_compat=True) + ) + assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is True + + def test_voice_compatible_false_by_default(self): + tts_registry.register_provider(_FakeTTSProvider(name="cartesia")) + assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is False + + def test_unregistered_provider_returns_false(self): + assert tts_tool._plugin_provider_is_voice_compatible("unknown") is False + + def test_empty_provider_name_returns_false(self): + assert tts_tool._plugin_provider_is_voice_compatible("") is False + + @pytest.mark.parametrize( + "builtin", + ["edge", "openai", "elevenlabs", "minimax", "gemini", + "mistral", "xai", "piper", "kittentts", "neutts"], + ) + def test_builtin_names_return_false(self, builtin): + """voice_compatible helper short-circuits built-ins so they go + through the legacy code path that handles their format quirks.""" + assert tts_tool._plugin_provider_is_voice_compatible(builtin) is False + + def test_voice_compatible_case_insensitive(self): + tts_registry.register_provider( + _FakeTTSProvider(name="cartesia", voice_compat=True) + ) + assert tts_tool._plugin_provider_is_voice_compatible("CARTESIA") is True + assert tts_tool._plugin_provider_is_voice_compatible(" cartesia ") is True + + def test_provider_property_exception_returns_false(self): + """A buggy ``voice_compatible`` property raising must not crash the + TTS pipeline.""" + + class _ExplodingProvider(_FakeTTSProvider): + @property + def voice_compatible(self) -> bool: + raise RuntimeError("boom") + + tts_registry.register_provider(_ExplodingProvider(name="cartesia")) + assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is False diff --git a/tools/tts_tool.py b/tools/tts_tool.py index 71535aed827..b1dc14f85aa 100644 --- a/tools/tts_tool.py +++ b/tools/tts_tool.py @@ -419,6 +419,123 @@ def _resolve_command_provider_config( return None +def _dispatch_to_plugin_provider( + text: str, + output_path: str, + provider: str, + tts_config: Dict[str, Any], +) -> Optional[str]: + """Route the call to a plugin-registered TTS provider, or return None. + + Returns the path to the written audio file on dispatch, or ``None`` + to fall through to the next resolution layer (built-in dispatch or + Edge TTS default). + + Resolution invariants enforced here (matches issue #30398): + + 1. Built-in provider names short-circuit — never reach the plugin + registry. The caller is responsible for the elif chain that + handles ``edge``/``openai``/etc.; this function explicitly + rejects those names defensively. + 2. Command-type providers declared under + ``tts.providers.: type: command`` (PR #17843) win over a + plugin with the same name. The caller passes us only when its + own command-provider check returned None — we re-verify here so + a refactor of the caller can't silently break the invariant. + 3. Plugin dispatch fires only when ``provider`` matches a registered + :class:`TTSProvider` whose ``name`` equals the configured value. + Unknown names return None (caller falls through to Edge default). + + Plugin exceptions are caught and re-raised — the outer + ``text_to_speech_tool`` try/except converts them to the standard + error envelope, matching how command-provider failures surface. + """ + if not provider: + return None + key = provider.lower().strip() + if key in BUILTIN_TTS_PROVIDERS: + return None + # Defense in depth: command-provider check should already have + # short-circuited the caller. If a same-name command config exists, + # bail so the command path wins. + if _is_command_provider_config(_get_named_provider_config(tts_config, key)): + return None + try: + from agent.tts_registry import get_provider + from hermes_cli.plugins import _ensure_plugins_discovered + + _ensure_plugins_discovered() + plugin_provider = get_provider(key) + if plugin_provider is None: + # Long-lived sessions may have discovered plugins before the + # bundled backend was patched in or before config changed. + # Retry once with a forced refresh before surfacing fall- + # through. Mirrors the image_gen / browser dispatcher + # recovery pattern. + _ensure_plugins_discovered(force=True) + plugin_provider = get_provider(key) + except Exception as exc: # noqa: BLE001 — discovery failure is non-fatal + logger.debug("tts plugin dispatch skipped (discovery failed): %s", exc) + return None + if plugin_provider is None: + return None + + # Resolve voice / model / format from tts_config — providers should + # treat all of these as optional and fall back to their own defaults + # when None is passed (matches the ABC contract documented on + # ``TTSProvider.synthesize``). + voice = tts_config.get("voice") if isinstance(tts_config, dict) else None + model = tts_config.get("model") if isinstance(tts_config, dict) else None + speed = tts_config.get("speed") if isinstance(tts_config, dict) else None + fmt = ( + tts_config.get("output_format", DEFAULT_COMMAND_TTS_OUTPUT_FORMAT) + if isinstance(tts_config, dict) + else DEFAULT_COMMAND_TTS_OUTPUT_FORMAT + ) + + logger.info( + "Generating speech with plugin TTS provider '%s'...", key, + ) + written = plugin_provider.synthesize( + text, + output_path, + voice=voice if isinstance(voice, str) and voice else None, + model=model if isinstance(model, str) and model else None, + speed=float(speed) if isinstance(speed, (int, float)) else None, + format=str(fmt).lower() if fmt else "mp3", + ) + # Provider contract: returns the (possibly rewritten) output path. + # Defensive against a provider returning None or a non-string — + # fall back to the caller's expected output_path. + return written if isinstance(written, str) and written else output_path + + +def _plugin_provider_is_voice_compatible(provider: str) -> bool: + """Return True when the registered plugin provider opts into voice + bubble delivery via its ``voice_compatible`` property. + + Defensive: any registry or property access failure means False + (matches the safe default for the command-provider path). + """ + if not provider: + return False + key = provider.lower().strip() + if key in BUILTIN_TTS_PROVIDERS: + return False + try: + from agent.tts_registry import get_provider + + plugin_provider = get_provider(key) + if plugin_provider is None: + return False + return bool(plugin_provider.voice_compatible) + except Exception as exc: # noqa: BLE001 + logger.debug( + "tts plugin voice_compatible check failed for '%s': %s", key, exc, + ) + return False + + def _iter_command_providers(tts_config: Dict[str, Any]): """Yield (name, config) pairs for every declared command-type provider.""" if not isinstance(tts_config, dict): @@ -1787,6 +1904,21 @@ def text_to_speech_tool( text, file_str, provider, command_provider_config, tts_config, ) + # Plugin-registered TTS backend (issue #30398). Fires when the + # configured provider is neither a built-in nor a command-type + # entry, AND a plugin is registered under that name. The walrus + # binds `_plugin_path` only when the dispatcher returns a path + # (i.e. a plugin was actually found); a None return falls + # through to the built-in elif chain so unknown names hit the + # Edge TTS default at the bottom. The dispatcher itself enforces + # built-ins-always-win + command-wins-over-plugin defensively. + elif provider not in BUILTIN_TTS_PROVIDERS and ( + _plugin_path := _dispatch_to_plugin_provider( + text, file_str, provider, tts_config, + ) + ) is not None: + file_str = _plugin_path + elif provider == "elevenlabs": try: _import_elevenlabs() @@ -1925,6 +2057,18 @@ def text_to_speech_tool( if opus_path: file_str = opus_path voice_compatible = file_str.endswith(".ogg") + elif provider not in BUILTIN_TTS_PROVIDERS: + # Plugin-registered provider (issue #30398). Voice-bubble + # delivery opts in via ``TTSProvider.voice_compatible`` + # (mirrors the command-provider opt-in). Plugins that + # already write Opus skip the ffmpeg conversion. + plugin_voice_compatible = _plugin_provider_is_voice_compatible(provider) + if plugin_voice_compatible: + if not file_str.endswith(".ogg"): + opus_path = _convert_to_opus(file_str) + if opus_path: + file_str = opus_path + voice_compatible = file_str.endswith(".ogg") elif ( want_opus and provider in {"edge", "neutts", "minimax", "xai", "kittentts", "piper"} diff --git a/website/docs/user-guide/features/plugins.md b/website/docs/user-guide/features/plugins.md index 9572f3538a6..260ab5de0b7 100644 --- a/website/docs/user-guide/features/plugins.md +++ b/website/docs/user-guide/features/plugins.md @@ -234,7 +234,7 @@ The table above shows the four plugin categories, but within "General plugins" t | A **context-compression strategy** | Context-engine plugin — `ctx.register_context_engine()` | [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) | | An **image-generation backend** (DALL·E, SDXL, …) | Backend plugin — `ctx.register_image_gen_provider()` | [Image Generation Provider Plugins](/docs/developer-guide/image-gen-provider-plugin) | | A **video-generation backend** (Veo, Kling, Pixverse, Grok-Imagine, Runway, …) | Backend plugin — `ctx.register_video_gen_provider()` | [Video Generation Provider Plugins](/docs/developer-guide/video-gen-provider-plugin) | -| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven — declare under `tts.providers.` with `type: command` in `config.yaml` | [TTS setup](/docs/user-guide/features/tts#custom-command-providers) | +| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven (recommended) — declare under `tts.providers.` with `type: command` in `config.yaml`. OR Python backend plugin — `ctx.register_tts_provider()` for Python-SDK / streaming engines that need more than a shell template. | [TTS Setup](/docs/user-guide/features/tts#custom-command-providers) · [Python plugin guide](/docs/user-guide/features/tts#python-plugin-providers) | | An **STT backend** (custom whisper binary, local ASR CLI) | Config-driven — set `HERMES_LOCAL_STT_COMMAND` env var to a shell template | [Voice Message Transcription (STT)](/docs/user-guide/features/tts#voice-message-transcription-stt) | | **External tools via MCP** (filesystem, GitHub, Linear, Notion, any MCP server) | Config-driven — declare `mcp_servers.` with `command:` / `url:` in `config.yaml`. Hermes auto-discovers the server's tools and registers them alongside built-ins. | [MCP](/docs/user-guide/features/mcp) | | **Additional skill sources** (custom GitHub repos, private skill indexes) | CLI — `hermes skills tap add ` | [Skills Hub](/docs/user-guide/features/skills#skills-hub) · [Publishing a custom tap](/docs/user-guide/features/skills#publishing-a-custom-skill-tap) | diff --git a/website/docs/user-guide/features/tts.md b/website/docs/user-guide/features/tts.md index d65a9f4031b..588a5119e51 100644 --- a/website/docs/user-guide/features/tts.md +++ b/website/docs/user-guide/features/tts.md @@ -297,6 +297,85 @@ Use `{{` and `}}` for literal braces. Command-type providers run whatever shell command you configure, with your user's permissions. Hermes quotes placeholder values and enforces the configured timeout, but the command template itself is trusted local input — treat it the same way you would a shell script on your PATH. +### Python plugin providers + +For TTS engines that can't be expressed as a single shell command — Python SDKs without a CLI, streaming engines, voice-listing APIs, OAuth-refreshing auth — register a Python plugin via `ctx.register_tts_provider()`. The plugin **coexists with** (does not replace) the [Custom command providers](#custom-command-providers) registry; pick the surface that fits your engine. + +#### When to pick which + +| Your backend has… | Use | +|---|---| +| A single CLI reading text from a file/stdin and writing audio to a file/stdout | **Command provider** (no Python needed) | +| Two or three CLIs chained with shell pipes | **Command provider** | +| A Python SDK only — no CLI | **Plugin** | +| Streaming bytes you want to deliver chunked (mid-generation voice bubbles) | **Plugin** (override `stream()`) | +| A voice-listing API used by `hermes setup` | **Plugin** (override `list_voices()`) | +| OAuth refresh flow (not a static bearer token) | **Plugin** | + +Built-ins always win, and command providers win over a same-name plugin — so plugins are safe to register against any non-built-in name without worrying about shadowing your existing config. + +#### Minimal plugin + +Drop this in `~/.hermes/plugins/my-tts/`: + +`plugin.yaml`: +```yaml +name: my-tts +version: 0.1.0 +description: "My custom Python TTS backend" +``` + +`__init__.py`: +```python +from agent.tts_provider import TTSProvider + + +class MyTTSProvider(TTSProvider): + @property + def name(self) -> str: + return "my-tts" # what tts.provider matches against + + @property + def display_name(self) -> str: + return "My Custom TTS" + + def is_available(self) -> bool: + # Return False when credentials/deps are missing — picker skips + # this row but the dispatcher still routes here on explicit config. + import os + return bool(os.environ.get("MY_TTS_API_KEY")) + + def synthesize(self, text, output_path, *, voice=None, model=None, + speed=None, format="mp3", **extra) -> str: + # Write audio bytes to output_path, return the path. + # Raise on failure — the dispatcher converts exceptions to a + # standard error envelope. + import my_tts_sdk + client = my_tts_sdk.Client() + audio_bytes = client.synthesize(text=text, voice=voice or "default") + with open(output_path, "wb") as f: + f.write(audio_bytes) + return output_path + + +def register(ctx): + ctx.register_tts_provider(MyTTSProvider()) +``` + +Enable it (`hermes plugins enable my-tts`), point `tts.provider` at it (`tts.provider: my-tts` in `config.yaml`), and the `text_to_speech` tool will route through your plugin. + +#### Optional hooks + +Override these on your provider class for richer integration: + +- `list_voices()` → list of `{id, display, language, gender, preview_url}` dicts shown in `hermes tools`. +- `list_models()` → list of `{id, display, languages, max_text_length}` dicts. +- `get_setup_schema()` → return `{name, badge, tag, env_vars: [{key, prompt, url}]}` to power the picker row in `hermes tools` / `hermes setup`. Without this, the plugin still works but its row in the picker is minimal. +- `stream(text, *, voice, model, format, **extra)` → iterator yielding audio bytes for streaming delivery (default raises `NotImplementedError`). +- `voice_compatible` property → set `True` if your output is Opus-compatible and the gateway should deliver it as a voice bubble (default `False` = regular audio attachment). + +See `agent/tts_provider.py` for the full ABC including docstrings. + ## Voice Message Transcription (STT) Voice messages sent on Telegram, Discord, WhatsApp, Slack, or Signal are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.