mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-08 08:11:38 +00:00
feat(tts): add register_tts_provider() plugin hook (closes #30398)
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398
This commit is contained in:
parent
782681f904
commit
00ec0b617c
13 changed files with 2037 additions and 1 deletions
|
|
@ -1753,6 +1753,62 @@ def _plugin_browser_providers() -> list[dict]:
|
|||
return rows
|
||||
|
||||
|
||||
def _plugin_tts_providers() -> list[dict]:
|
||||
"""Build picker-row dicts from plugin-registered TTS providers.
|
||||
|
||||
Issue #30398 — the ``register_tts_provider()`` plugin hook
|
||||
coexists alongside the 10 built-in TTS providers
|
||||
(``edge``/``openai``/``elevenlabs``/…) and the
|
||||
``tts.providers.<name>: type: command`` registry from PR #17843.
|
||||
Built-in rows stay hardcoded in ``TOOL_CATEGORIES["tts"]``; this
|
||||
function only injects PLUGIN-registered providers.
|
||||
|
||||
Defensive: plugins whose name collides with a built-in TTS provider
|
||||
are filtered out — even though the registry already rejects them
|
||||
at registration time, a future code path that registers directly
|
||||
via :func:`agent.tts_registry.register_provider` could slip
|
||||
through. Filtering here keeps the picker invariant.
|
||||
"""
|
||||
try:
|
||||
from agent.tts_registry import _BUILTIN_NAMES, list_providers
|
||||
from hermes_cli.plugins import _ensure_plugins_discovered
|
||||
|
||||
_ensure_plugins_discovered()
|
||||
providers = list_providers()
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
rows: list[dict] = []
|
||||
for provider in providers:
|
||||
name = getattr(provider, "name", None)
|
||||
if not name:
|
||||
continue
|
||||
# Defensive: reject built-in shadowing at the picker layer too.
|
||||
if name.lower().strip() in _BUILTIN_NAMES:
|
||||
continue
|
||||
try:
|
||||
schema = provider.get_setup_schema()
|
||||
except Exception:
|
||||
continue
|
||||
if not isinstance(schema, dict):
|
||||
continue
|
||||
row = {
|
||||
"name": schema.get("name", provider.display_name),
|
||||
"badge": schema.get("badge", ""),
|
||||
"tag": schema.get("tag", ""),
|
||||
"env_vars": schema.get("env_vars", []),
|
||||
# Selecting this row writes ``tts.provider: <name>`` — the
|
||||
# same write-path used by hardcoded rows. The plugin
|
||||
# dispatcher picks it up automatically from there.
|
||||
"tts_provider": name,
|
||||
"tts_plugin_name": name,
|
||||
}
|
||||
if schema.get("post_setup"):
|
||||
row["post_setup"] = schema["post_setup"]
|
||||
rows.append(row)
|
||||
return rows
|
||||
|
||||
|
||||
def _visible_providers(cat: dict, config: dict) -> list[dict]:
|
||||
"""Return provider entries visible for the current auth/config state."""
|
||||
features = get_nous_subscription_features(config)
|
||||
|
|
@ -1790,6 +1846,12 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
|
|||
if cat.get("name") == "Browser Automation":
|
||||
visible.extend(_plugin_browser_providers())
|
||||
|
||||
# Inject plugin-registered TTS backends (issue #30398). Plugin rows
|
||||
# render BELOW the 10 hardcoded built-in rows. Built-in shadowing
|
||||
# is filtered out by ``_plugin_tts_providers`` defensively.
|
||||
if cat.get("name") == "Text-to-Speech":
|
||||
visible.extend(_plugin_tts_providers())
|
||||
|
||||
return visible
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue