feat(tts): add register_tts_provider() plugin hook (closes #30398)

Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398
2026-06-05 07:41:39 +00:00 · 2026-05-22 17:58:07 +05:30 · 2026-05-22 17:58:07 +05:30 · 00ec0b617c
commit 00ec0b617c
parent 782681f904
13 changed files with 2037 additions and 1 deletions
--- a/tools/tts_tool.py
+++ b/tools/tts_tool.py
@ -419,6 +419,123 @@ def _resolve_command_provider_config(
    return None


+def _dispatch_to_plugin_provider(
+    text: str,
+    output_path: str,
+    provider: str,
+    tts_config: Dict[str, Any],
+) -> Optional[str]:
+    """Route the call to a plugin-registered TTS provider, or return None.
+
+    Returns the path to the written audio file on dispatch, or ``None``
+    to fall through to the next resolution layer (built-in dispatch or
+    Edge TTS default).
+
+    Resolution invariants enforced here (matches issue #30398):
+
+    1. Built-in provider names short-circuit — never reach the plugin
+       registry. The caller is responsible for the elif chain that
+       handles ``edge``/``openai``/etc.; this function explicitly
+       rejects those names defensively.
+    2. Command-type providers declared under
+       ``tts.providers.<name>: type: command`` (PR #17843) win over a
+       plugin with the same name. The caller passes us only when its
+       own command-provider check returned None — we re-verify here so
+       a refactor of the caller can't silently break the invariant.
+    3. Plugin dispatch fires only when ``provider`` matches a registered
+       :class:`TTSProvider` whose ``name`` equals the configured value.
+       Unknown names return None (caller falls through to Edge default).
+
+    Plugin exceptions are caught and re-raised — the outer
+    ``text_to_speech_tool`` try/except converts them to the standard
+    error envelope, matching how command-provider failures surface.
+    """
+    if not provider:
+        return None
+    key = provider.lower().strip()
+    if key in BUILTIN_TTS_PROVIDERS:
+        return None
+    # Defense in depth: command-provider check should already have
+    # short-circuited the caller. If a same-name command config exists,
+    # bail so the command path wins.
+    if _is_command_provider_config(_get_named_provider_config(tts_config, key)):
+        return None
+    try:
+        from agent.tts_registry import get_provider
+        from hermes_cli.plugins import _ensure_plugins_discovered
+
+        _ensure_plugins_discovered()
+        plugin_provider = get_provider(key)
+        if plugin_provider is None:
+            # Long-lived sessions may have discovered plugins before the
+            # bundled backend was patched in or before config changed.
+            # Retry once with a forced refresh before surfacing fall-
+            # through. Mirrors the image_gen / browser dispatcher
+            # recovery pattern.
+            _ensure_plugins_discovered(force=True)
+            plugin_provider = get_provider(key)
+    except Exception as exc:  # noqa: BLE001 — discovery failure is non-fatal
+        logger.debug("tts plugin dispatch skipped (discovery failed): %s", exc)
+        return None
+    if plugin_provider is None:
+        return None
+
+    # Resolve voice / model / format from tts_config — providers should
+    # treat all of these as optional and fall back to their own defaults
+    # when None is passed (matches the ABC contract documented on
+    # ``TTSProvider.synthesize``).
+    voice = tts_config.get("voice") if isinstance(tts_config, dict) else None
+    model = tts_config.get("model") if isinstance(tts_config, dict) else None
+    speed = tts_config.get("speed") if isinstance(tts_config, dict) else None
+    fmt = (
+        tts_config.get("output_format", DEFAULT_COMMAND_TTS_OUTPUT_FORMAT)
+        if isinstance(tts_config, dict)
+        else DEFAULT_COMMAND_TTS_OUTPUT_FORMAT
+    )
+
+    logger.info(
+        "Generating speech with plugin TTS provider '%s'...", key,
+    )
+    written = plugin_provider.synthesize(
+        text,
+        output_path,
+        voice=voice if isinstance(voice, str) and voice else None,
+        model=model if isinstance(model, str) and model else None,
+        speed=float(speed) if isinstance(speed, (int, float)) else None,
+        format=str(fmt).lower() if fmt else "mp3",
+    )
+    # Provider contract: returns the (possibly rewritten) output path.
+    # Defensive against a provider returning None or a non-string —
+    # fall back to the caller's expected output_path.
+    return written if isinstance(written, str) and written else output_path
+
+
+def _plugin_provider_is_voice_compatible(provider: str) -> bool:
+    """Return True when the registered plugin provider opts into voice
+    bubble delivery via its ``voice_compatible`` property.
+
+    Defensive: any registry or property access failure means False
+    (matches the safe default for the command-provider path).
+    """
+    if not provider:
+        return False
+    key = provider.lower().strip()
+    if key in BUILTIN_TTS_PROVIDERS:
+        return False
+    try:
+        from agent.tts_registry import get_provider
+
+        plugin_provider = get_provider(key)
+        if plugin_provider is None:
+            return False
+        return bool(plugin_provider.voice_compatible)
+    except Exception as exc:  # noqa: BLE001
+        logger.debug(
+            "tts plugin voice_compatible check failed for '%s': %s", key, exc,
+        )
+        return False
+
+
 def _iter_command_providers(tts_config: Dict[str, Any]):
    """Yield (name, config) pairs for every declared command-type provider."""
    if not isinstance(tts_config, dict):
@ -1787,6 +1904,21 @@ def text_to_speech_tool(
                text, file_str, provider, command_provider_config, tts_config,
            )

+        # Plugin-registered TTS backend (issue #30398). Fires when the
+        # configured provider is neither a built-in nor a command-type
+        # entry, AND a plugin is registered under that name. The walrus
+        # binds `_plugin_path` only when the dispatcher returns a path
+        # (i.e. a plugin was actually found); a None return falls
+        # through to the built-in elif chain so unknown names hit the
+        # Edge TTS default at the bottom. The dispatcher itself enforces
+        # built-ins-always-win + command-wins-over-plugin defensively.
+        elif provider not in BUILTIN_TTS_PROVIDERS and (
+            _plugin_path := _dispatch_to_plugin_provider(
+                text, file_str, provider, tts_config,
+            )
+        ) is not None:
+            file_str = _plugin_path
+
        elif provider == "elevenlabs":
            try:
                _import_elevenlabs()
@ -1925,6 +2057,18 @@ def text_to_speech_tool(
                    if opus_path:
                        file_str = opus_path
                voice_compatible = file_str.endswith(".ogg")
+        elif provider not in BUILTIN_TTS_PROVIDERS:
+            # Plugin-registered provider (issue #30398). Voice-bubble
+            # delivery opts in via ``TTSProvider.voice_compatible``
+            # (mirrors the command-provider opt-in). Plugins that
+            # already write Opus skip the ffmpeg conversion.
+            plugin_voice_compatible = _plugin_provider_is_voice_compatible(provider)
+            if plugin_voice_compatible:
+                if not file_str.endswith(".ogg"):
+                    opus_path = _convert_to_opus(file_str)
+                    if opus_path:
+                        file_str = opus_path
+                voice_compatible = file_str.endswith(".ogg")
        elif (
            want_opus
            and provider in {"edge", "neutts", "minimax", "xai", "kittentts", "piper"}