mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398
323 lines
12 KiB
Python
323 lines
12 KiB
Python
"""Tests for TTS plugin dispatch in tools/tts_tool.py (issue #30398).
|
|
|
|
Covers the three core invariants of the plugin dispatcher:
|
|
|
|
1. Built-in provider names short-circuit — plugins NEVER win over a
|
|
built-in. Even if a plugin somehow ended up in the registry with a
|
|
built-in name (which the registry already blocks), the dispatcher
|
|
re-checks defensively.
|
|
2. Command-type providers declared under ``tts.providers.<name>: type:
|
|
command`` (PR #17843) win over a plugin with the same name. Config
|
|
is more local than plugin install.
|
|
3. Plugin dispatch fires only when the configured provider is neither
|
|
a built-in nor a command-type entry, AND a plugin is registered
|
|
under that name. Unknown names fall through.
|
|
|
|
Also exercises:
|
|
- Plugin exceptions surface to the outer error envelope (don't crash)
|
|
- Plugin returning a different path is honored
|
|
- voice_compatible: True triggers ffmpeg opus conversion path
|
|
- voice_compatible: False keeps the file as-is
|
|
|
|
The dispatcher is exercised in isolation — we don't actually call
|
|
``text_to_speech_tool`` because that would require real audio file
|
|
writes. Each test directly calls
|
|
``tools.tts_tool._dispatch_to_plugin_provider`` / the predicate
|
|
helpers.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from typing import Optional
|
|
|
|
import pytest
|
|
|
|
from agent import tts_registry
|
|
from agent.tts_provider import TTSProvider
|
|
from tools import tts_tool
|
|
|
|
|
|
class _FakeTTSProvider(TTSProvider):
|
|
def __init__(
|
|
self,
|
|
name: str,
|
|
voice_compat: bool = False,
|
|
raise_exc: Optional[BaseException] = None,
|
|
return_path: Optional[str] = None,
|
|
):
|
|
self._name = name
|
|
self._voice_compat = voice_compat
|
|
self._raise_exc = raise_exc
|
|
self._return_path = return_path
|
|
# Recorded for assertions
|
|
self.last_call: Optional[dict] = None
|
|
|
|
@property
|
|
def name(self) -> str:
|
|
return self._name
|
|
|
|
@property
|
|
def voice_compatible(self) -> bool:
|
|
return self._voice_compat
|
|
|
|
def synthesize(self, text, output_path, **kw):
|
|
self.last_call = {
|
|
"text": text,
|
|
"output_path": output_path,
|
|
"kwargs": dict(kw),
|
|
}
|
|
if self._raise_exc is not None:
|
|
raise self._raise_exc
|
|
return self._return_path if self._return_path is not None else output_path
|
|
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def _reset_registry():
|
|
tts_registry._reset_for_tests()
|
|
yield
|
|
tts_registry._reset_for_tests()
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Resolution invariants
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestBuiltinAlwaysWins:
|
|
"""Built-in TTS provider names short-circuit the dispatcher.
|
|
|
|
Even with a plugin registered (which the registry would reject —
|
|
but the dispatcher is defensive), built-in names return None so
|
|
the caller's elif chain handles them natively.
|
|
"""
|
|
|
|
@pytest.mark.parametrize(
|
|
"builtin",
|
|
["edge", "openai", "elevenlabs", "minimax", "gemini",
|
|
"mistral", "xai", "piper", "kittentts", "neutts"],
|
|
)
|
|
def test_dispatcher_short_circuits_builtin(self, builtin):
|
|
result = tts_tool._dispatch_to_plugin_provider(
|
|
text="hello",
|
|
output_path="/tmp/out.mp3",
|
|
provider=builtin,
|
|
tts_config={},
|
|
)
|
|
assert result is None, (
|
|
f"Built-in {builtin!r} must short-circuit plugin dispatch. "
|
|
"If this test fails, the dispatcher would silently let a "
|
|
"plugin with a built-in name shadow the native handler — "
|
|
"violating the precedence rule from PR #17843."
|
|
)
|
|
|
|
def test_dispatcher_short_circuits_builtin_case_insensitive(self):
|
|
for variant in ("EDGE", "Edge", " edge ", "eDgE"):
|
|
assert (
|
|
tts_tool._dispatch_to_plugin_provider(
|
|
text="hello", output_path="/tmp/x.mp3",
|
|
provider=variant, tts_config={},
|
|
) is None
|
|
)
|
|
|
|
|
|
class TestCommandProviderWins:
|
|
"""A same-name ``tts.providers.<name>: type: command`` config beats a plugin.
|
|
|
|
Locality: a user's command-provider config is more specific than
|
|
whichever plugin happens to be installed.
|
|
"""
|
|
|
|
def test_command_config_beats_plugin(self):
|
|
tts_registry.register_provider(_FakeTTSProvider(name="my-tts"))
|
|
|
|
result = tts_tool._dispatch_to_plugin_provider(
|
|
text="hello",
|
|
output_path="/tmp/out.mp3",
|
|
provider="my-tts",
|
|
tts_config={
|
|
"providers": {
|
|
"my-tts": {
|
|
"type": "command",
|
|
"command": "echo 'hi' > {output_path}",
|
|
},
|
|
},
|
|
},
|
|
)
|
|
# Plugin path returns None → caller falls back to command
|
|
# provider dispatch (handled by the outer text_to_speech_tool
|
|
# via _resolve_command_provider_config).
|
|
assert result is None
|
|
|
|
|
|
class TestPluginDispatch:
|
|
"""Happy path: configured name matches a registered plugin, dispatcher fires."""
|
|
|
|
def test_registered_plugin_called(self):
|
|
provider = _FakeTTSProvider(name="cartesia")
|
|
tts_registry.register_provider(provider)
|
|
|
|
result = tts_tool._dispatch_to_plugin_provider(
|
|
text="hello world",
|
|
output_path="/tmp/out.mp3",
|
|
provider="cartesia",
|
|
tts_config={},
|
|
)
|
|
assert result == "/tmp/out.mp3"
|
|
assert provider.last_call is not None
|
|
assert provider.last_call["text"] == "hello world"
|
|
assert provider.last_call["output_path"] == "/tmp/out.mp3"
|
|
|
|
def test_unregistered_name_returns_none(self):
|
|
result = tts_tool._dispatch_to_plugin_provider(
|
|
text="hello",
|
|
output_path="/tmp/out.mp3",
|
|
provider="unknown-tts",
|
|
tts_config={},
|
|
)
|
|
assert result is None
|
|
|
|
def test_voice_model_speed_format_forwarded(self):
|
|
provider = _FakeTTSProvider(name="cartesia")
|
|
tts_registry.register_provider(provider)
|
|
|
|
result = tts_tool._dispatch_to_plugin_provider(
|
|
text="hello",
|
|
output_path="/tmp/out.opus",
|
|
provider="cartesia",
|
|
tts_config={
|
|
"voice": "voice-aria",
|
|
"model": "sonic-2",
|
|
"speed": 1.2,
|
|
"output_format": "opus",
|
|
},
|
|
)
|
|
assert result == "/tmp/out.opus"
|
|
kwargs = provider.last_call["kwargs"]
|
|
assert kwargs["voice"] == "voice-aria"
|
|
assert kwargs["model"] == "sonic-2"
|
|
assert kwargs["speed"] == 1.2
|
|
assert kwargs["format"] == "opus"
|
|
|
|
def test_empty_string_voice_passed_as_none(self):
|
|
"""Empty-string config values are normalized to None so providers can
|
|
fall back to their own defaults (matches the ABC contract)."""
|
|
provider = _FakeTTSProvider(name="cartesia")
|
|
tts_registry.register_provider(provider)
|
|
|
|
tts_tool._dispatch_to_plugin_provider(
|
|
text="hello",
|
|
output_path="/tmp/out.mp3",
|
|
provider="cartesia",
|
|
tts_config={"voice": "", "model": ""},
|
|
)
|
|
kwargs = provider.last_call["kwargs"]
|
|
assert kwargs["voice"] is None
|
|
assert kwargs["model"] is None
|
|
|
|
def test_provider_returning_different_path_honored(self):
|
|
"""If a provider rewrites the output path (e.g. format-driven extension
|
|
change), the dispatcher returns the new path."""
|
|
provider = _FakeTTSProvider(name="cartesia", return_path="/tmp/rewritten.opus")
|
|
tts_registry.register_provider(provider)
|
|
|
|
result = tts_tool._dispatch_to_plugin_provider(
|
|
text="hi",
|
|
output_path="/tmp/out.mp3",
|
|
provider="cartesia",
|
|
tts_config={},
|
|
)
|
|
assert result == "/tmp/rewritten.opus"
|
|
|
|
def test_provider_returning_none_falls_back_to_output_path(self):
|
|
"""Defensive: a provider returning None means the dispatcher should
|
|
report the caller-supplied output_path (matches the ABC contract — the
|
|
provider is supposed to write to output_path)."""
|
|
provider = _FakeTTSProvider(name="cartesia", return_path=None)
|
|
# Override the default-output-path behavior to return None explicitly
|
|
provider._return_path = None
|
|
|
|
class _ReturnsNone(_FakeTTSProvider):
|
|
def synthesize(self, text, output_path, **kw):
|
|
return None # type: ignore[return-value]
|
|
|
|
provider2 = _ReturnsNone(name="weird")
|
|
tts_registry.register_provider(provider2)
|
|
|
|
result = tts_tool._dispatch_to_plugin_provider(
|
|
text="hi",
|
|
output_path="/tmp/out.mp3",
|
|
provider="weird",
|
|
tts_config={},
|
|
)
|
|
assert result == "/tmp/out.mp3"
|
|
|
|
def test_provider_exception_bubbles_up(self):
|
|
"""Plugin exceptions are NOT swallowed by the dispatcher — they bubble
|
|
up so the outer ``text_to_speech_tool`` try/except converts them to
|
|
the standard error envelope. Matches command-provider failure
|
|
behavior."""
|
|
provider = _FakeTTSProvider(
|
|
name="cartesia",
|
|
raise_exc=RuntimeError("network down"),
|
|
)
|
|
tts_registry.register_provider(provider)
|
|
|
|
with pytest.raises(RuntimeError, match="network down"):
|
|
tts_tool._dispatch_to_plugin_provider(
|
|
text="hi",
|
|
output_path="/tmp/out.mp3",
|
|
provider="cartesia",
|
|
tts_config={},
|
|
)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# voice_compatible flag
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestVoiceCompatibleHelper:
|
|
def test_voice_compatible_true(self):
|
|
tts_registry.register_provider(
|
|
_FakeTTSProvider(name="cartesia", voice_compat=True)
|
|
)
|
|
assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is True
|
|
|
|
def test_voice_compatible_false_by_default(self):
|
|
tts_registry.register_provider(_FakeTTSProvider(name="cartesia"))
|
|
assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is False
|
|
|
|
def test_unregistered_provider_returns_false(self):
|
|
assert tts_tool._plugin_provider_is_voice_compatible("unknown") is False
|
|
|
|
def test_empty_provider_name_returns_false(self):
|
|
assert tts_tool._plugin_provider_is_voice_compatible("") is False
|
|
|
|
@pytest.mark.parametrize(
|
|
"builtin",
|
|
["edge", "openai", "elevenlabs", "minimax", "gemini",
|
|
"mistral", "xai", "piper", "kittentts", "neutts"],
|
|
)
|
|
def test_builtin_names_return_false(self, builtin):
|
|
"""voice_compatible helper short-circuits built-ins so they go
|
|
through the legacy code path that handles their format quirks."""
|
|
assert tts_tool._plugin_provider_is_voice_compatible(builtin) is False
|
|
|
|
def test_voice_compatible_case_insensitive(self):
|
|
tts_registry.register_provider(
|
|
_FakeTTSProvider(name="cartesia", voice_compat=True)
|
|
)
|
|
assert tts_tool._plugin_provider_is_voice_compatible("CARTESIA") is True
|
|
assert tts_tool._plugin_provider_is_voice_compatible(" cartesia ") is True
|
|
|
|
def test_provider_property_exception_returns_false(self):
|
|
"""A buggy ``voice_compatible`` property raising must not crash the
|
|
TTS pipeline."""
|
|
|
|
class _ExplodingProvider(_FakeTTSProvider):
|
|
@property
|
|
def voice_compatible(self) -> bool:
|
|
raise RuntimeError("boom")
|
|
|
|
tts_registry.register_provider(_ExplodingProvider(name="cartesia"))
|
|
assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is False
|