mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-05 07:41:39 +00:00
feat(tts): add register_tts_provider() plugin hook (closes #30398)
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398
This commit is contained in:
parent
782681f904
commit
00ec0b617c
13 changed files with 2037 additions and 1 deletions
|
|
@ -419,6 +419,123 @@ def _resolve_command_provider_config(
|
|||
return None
|
||||
|
||||
|
||||
def _dispatch_to_plugin_provider(
|
||||
text: str,
|
||||
output_path: str,
|
||||
provider: str,
|
||||
tts_config: Dict[str, Any],
|
||||
) -> Optional[str]:
|
||||
"""Route the call to a plugin-registered TTS provider, or return None.
|
||||
|
||||
Returns the path to the written audio file on dispatch, or ``None``
|
||||
to fall through to the next resolution layer (built-in dispatch or
|
||||
Edge TTS default).
|
||||
|
||||
Resolution invariants enforced here (matches issue #30398):
|
||||
|
||||
1. Built-in provider names short-circuit — never reach the plugin
|
||||
registry. The caller is responsible for the elif chain that
|
||||
handles ``edge``/``openai``/etc.; this function explicitly
|
||||
rejects those names defensively.
|
||||
2. Command-type providers declared under
|
||||
``tts.providers.<name>: type: command`` (PR #17843) win over a
|
||||
plugin with the same name. The caller passes us only when its
|
||||
own command-provider check returned None — we re-verify here so
|
||||
a refactor of the caller can't silently break the invariant.
|
||||
3. Plugin dispatch fires only when ``provider`` matches a registered
|
||||
:class:`TTSProvider` whose ``name`` equals the configured value.
|
||||
Unknown names return None (caller falls through to Edge default).
|
||||
|
||||
Plugin exceptions are caught and re-raised — the outer
|
||||
``text_to_speech_tool`` try/except converts them to the standard
|
||||
error envelope, matching how command-provider failures surface.
|
||||
"""
|
||||
if not provider:
|
||||
return None
|
||||
key = provider.lower().strip()
|
||||
if key in BUILTIN_TTS_PROVIDERS:
|
||||
return None
|
||||
# Defense in depth: command-provider check should already have
|
||||
# short-circuited the caller. If a same-name command config exists,
|
||||
# bail so the command path wins.
|
||||
if _is_command_provider_config(_get_named_provider_config(tts_config, key)):
|
||||
return None
|
||||
try:
|
||||
from agent.tts_registry import get_provider
|
||||
from hermes_cli.plugins import _ensure_plugins_discovered
|
||||
|
||||
_ensure_plugins_discovered()
|
||||
plugin_provider = get_provider(key)
|
||||
if plugin_provider is None:
|
||||
# Long-lived sessions may have discovered plugins before the
|
||||
# bundled backend was patched in or before config changed.
|
||||
# Retry once with a forced refresh before surfacing fall-
|
||||
# through. Mirrors the image_gen / browser dispatcher
|
||||
# recovery pattern.
|
||||
_ensure_plugins_discovered(force=True)
|
||||
plugin_provider = get_provider(key)
|
||||
except Exception as exc: # noqa: BLE001 — discovery failure is non-fatal
|
||||
logger.debug("tts plugin dispatch skipped (discovery failed): %s", exc)
|
||||
return None
|
||||
if plugin_provider is None:
|
||||
return None
|
||||
|
||||
# Resolve voice / model / format from tts_config — providers should
|
||||
# treat all of these as optional and fall back to their own defaults
|
||||
# when None is passed (matches the ABC contract documented on
|
||||
# ``TTSProvider.synthesize``).
|
||||
voice = tts_config.get("voice") if isinstance(tts_config, dict) else None
|
||||
model = tts_config.get("model") if isinstance(tts_config, dict) else None
|
||||
speed = tts_config.get("speed") if isinstance(tts_config, dict) else None
|
||||
fmt = (
|
||||
tts_config.get("output_format", DEFAULT_COMMAND_TTS_OUTPUT_FORMAT)
|
||||
if isinstance(tts_config, dict)
|
||||
else DEFAULT_COMMAND_TTS_OUTPUT_FORMAT
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"Generating speech with plugin TTS provider '%s'...", key,
|
||||
)
|
||||
written = plugin_provider.synthesize(
|
||||
text,
|
||||
output_path,
|
||||
voice=voice if isinstance(voice, str) and voice else None,
|
||||
model=model if isinstance(model, str) and model else None,
|
||||
speed=float(speed) if isinstance(speed, (int, float)) else None,
|
||||
format=str(fmt).lower() if fmt else "mp3",
|
||||
)
|
||||
# Provider contract: returns the (possibly rewritten) output path.
|
||||
# Defensive against a provider returning None or a non-string —
|
||||
# fall back to the caller's expected output_path.
|
||||
return written if isinstance(written, str) and written else output_path
|
||||
|
||||
|
||||
def _plugin_provider_is_voice_compatible(provider: str) -> bool:
|
||||
"""Return True when the registered plugin provider opts into voice
|
||||
bubble delivery via its ``voice_compatible`` property.
|
||||
|
||||
Defensive: any registry or property access failure means False
|
||||
(matches the safe default for the command-provider path).
|
||||
"""
|
||||
if not provider:
|
||||
return False
|
||||
key = provider.lower().strip()
|
||||
if key in BUILTIN_TTS_PROVIDERS:
|
||||
return False
|
||||
try:
|
||||
from agent.tts_registry import get_provider
|
||||
|
||||
plugin_provider = get_provider(key)
|
||||
if plugin_provider is None:
|
||||
return False
|
||||
return bool(plugin_provider.voice_compatible)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.debug(
|
||||
"tts plugin voice_compatible check failed for '%s': %s", key, exc,
|
||||
)
|
||||
return False
|
||||
|
||||
|
||||
def _iter_command_providers(tts_config: Dict[str, Any]):
|
||||
"""Yield (name, config) pairs for every declared command-type provider."""
|
||||
if not isinstance(tts_config, dict):
|
||||
|
|
@ -1787,6 +1904,21 @@ def text_to_speech_tool(
|
|||
text, file_str, provider, command_provider_config, tts_config,
|
||||
)
|
||||
|
||||
# Plugin-registered TTS backend (issue #30398). Fires when the
|
||||
# configured provider is neither a built-in nor a command-type
|
||||
# entry, AND a plugin is registered under that name. The walrus
|
||||
# binds `_plugin_path` only when the dispatcher returns a path
|
||||
# (i.e. a plugin was actually found); a None return falls
|
||||
# through to the built-in elif chain so unknown names hit the
|
||||
# Edge TTS default at the bottom. The dispatcher itself enforces
|
||||
# built-ins-always-win + command-wins-over-plugin defensively.
|
||||
elif provider not in BUILTIN_TTS_PROVIDERS and (
|
||||
_plugin_path := _dispatch_to_plugin_provider(
|
||||
text, file_str, provider, tts_config,
|
||||
)
|
||||
) is not None:
|
||||
file_str = _plugin_path
|
||||
|
||||
elif provider == "elevenlabs":
|
||||
try:
|
||||
_import_elevenlabs()
|
||||
|
|
@ -1925,6 +2057,18 @@ def text_to_speech_tool(
|
|||
if opus_path:
|
||||
file_str = opus_path
|
||||
voice_compatible = file_str.endswith(".ogg")
|
||||
elif provider not in BUILTIN_TTS_PROVIDERS:
|
||||
# Plugin-registered provider (issue #30398). Voice-bubble
|
||||
# delivery opts in via ``TTSProvider.voice_compatible``
|
||||
# (mirrors the command-provider opt-in). Plugins that
|
||||
# already write Opus skip the ffmpeg conversion.
|
||||
plugin_voice_compatible = _plugin_provider_is_voice_compatible(provider)
|
||||
if plugin_voice_compatible:
|
||||
if not file_str.endswith(".ogg"):
|
||||
opus_path = _convert_to_opus(file_str)
|
||||
if opus_path:
|
||||
file_str = opus_path
|
||||
voice_compatible = file_str.endswith(".ogg")
|
||||
elif (
|
||||
want_opus
|
||||
and provider in {"edge", "neutts", "minimax", "xai", "kittentts", "piper"}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue