mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-05 07:41:39 +00:00
feat(tts): add register_tts_provider() plugin hook (closes #30398)
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398
This commit is contained in:
parent
782681f904
commit
00ec0b617c
13 changed files with 2037 additions and 1 deletions
|
|
@ -640,6 +640,44 @@ class PluginContext:
|
|||
self.manifest.name, provider.name,
|
||||
)
|
||||
|
||||
# -- TTS provider registration -------------------------------------------
|
||||
|
||||
def register_tts_provider(self, provider) -> None:
|
||||
"""Register a text-to-speech backend.
|
||||
|
||||
``provider`` must be an instance of
|
||||
:class:`agent.tts_provider.TTSProvider`. The ``provider.name``
|
||||
attribute is what ``tts.provider`` in ``config.yaml`` matches
|
||||
against when routing ``text_to_speech`` tool calls — **but
|
||||
only when**:
|
||||
|
||||
1. ``provider.name`` is NOT a built-in TTS provider name
|
||||
(``edge``, ``openai``, ``elevenlabs``, …). Built-ins always
|
||||
win — the registry rejects shadowing names with a warning.
|
||||
2. There is NO ``tts.providers.<name>: type: command`` entry
|
||||
with the same name. Command-providers (PR #17843) win on
|
||||
name collision because config is more local than plugin
|
||||
install.
|
||||
|
||||
Coexists with the command-provider registry rather than
|
||||
replacing it — see issue #30398 for the full design rationale.
|
||||
"""
|
||||
from agent.tts_provider import TTSProvider
|
||||
from agent.tts_registry import register_provider as _register_tts_provider
|
||||
|
||||
if not isinstance(provider, TTSProvider):
|
||||
logger.warning(
|
||||
"Plugin '%s' tried to register a TTS provider that does "
|
||||
"not inherit from TTSProvider. Ignoring.",
|
||||
self.manifest.name,
|
||||
)
|
||||
return
|
||||
_register_tts_provider(provider)
|
||||
logger.info(
|
||||
"Plugin '%s' registered TTS provider: %s",
|
||||
self.manifest.name, provider.name,
|
||||
)
|
||||
|
||||
# -- platform adapter registration ---------------------------------------
|
||||
|
||||
def register_platform(
|
||||
|
|
|
|||
|
|
@ -1753,6 +1753,62 @@ def _plugin_browser_providers() -> list[dict]:
|
|||
return rows
|
||||
|
||||
|
||||
def _plugin_tts_providers() -> list[dict]:
|
||||
"""Build picker-row dicts from plugin-registered TTS providers.
|
||||
|
||||
Issue #30398 — the ``register_tts_provider()`` plugin hook
|
||||
coexists alongside the 10 built-in TTS providers
|
||||
(``edge``/``openai``/``elevenlabs``/…) and the
|
||||
``tts.providers.<name>: type: command`` registry from PR #17843.
|
||||
Built-in rows stay hardcoded in ``TOOL_CATEGORIES["tts"]``; this
|
||||
function only injects PLUGIN-registered providers.
|
||||
|
||||
Defensive: plugins whose name collides with a built-in TTS provider
|
||||
are filtered out — even though the registry already rejects them
|
||||
at registration time, a future code path that registers directly
|
||||
via :func:`agent.tts_registry.register_provider` could slip
|
||||
through. Filtering here keeps the picker invariant.
|
||||
"""
|
||||
try:
|
||||
from agent.tts_registry import _BUILTIN_NAMES, list_providers
|
||||
from hermes_cli.plugins import _ensure_plugins_discovered
|
||||
|
||||
_ensure_plugins_discovered()
|
||||
providers = list_providers()
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
rows: list[dict] = []
|
||||
for provider in providers:
|
||||
name = getattr(provider, "name", None)
|
||||
if not name:
|
||||
continue
|
||||
# Defensive: reject built-in shadowing at the picker layer too.
|
||||
if name.lower().strip() in _BUILTIN_NAMES:
|
||||
continue
|
||||
try:
|
||||
schema = provider.get_setup_schema()
|
||||
except Exception:
|
||||
continue
|
||||
if not isinstance(schema, dict):
|
||||
continue
|
||||
row = {
|
||||
"name": schema.get("name", provider.display_name),
|
||||
"badge": schema.get("badge", ""),
|
||||
"tag": schema.get("tag", ""),
|
||||
"env_vars": schema.get("env_vars", []),
|
||||
# Selecting this row writes ``tts.provider: <name>`` — the
|
||||
# same write-path used by hardcoded rows. The plugin
|
||||
# dispatcher picks it up automatically from there.
|
||||
"tts_provider": name,
|
||||
"tts_plugin_name": name,
|
||||
}
|
||||
if schema.get("post_setup"):
|
||||
row["post_setup"] = schema["post_setup"]
|
||||
rows.append(row)
|
||||
return rows
|
||||
|
||||
|
||||
def _visible_providers(cat: dict, config: dict) -> list[dict]:
|
||||
"""Return provider entries visible for the current auth/config state."""
|
||||
features = get_nous_subscription_features(config)
|
||||
|
|
@ -1790,6 +1846,12 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
|
|||
if cat.get("name") == "Browser Automation":
|
||||
visible.extend(_plugin_browser_providers())
|
||||
|
||||
# Inject plugin-registered TTS backends (issue #30398). Plugin rows
|
||||
# render BELOW the 10 hardcoded built-in rows. Built-in shadowing
|
||||
# is filtered out by ``_plugin_tts_providers`` defensively.
|
||||
if cat.get("name") == "Text-to-Speech":
|
||||
visible.extend(_plugin_tts_providers())
|
||||
|
||||
return visible
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue