feat(tts): add register_tts_provider() plugin hook (closes #30398)

Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point
to the plugin context API, **alongside** the existing config-driven
`tts.providers.<name>: type: command` registry from PR #17843. This is
additive — the command-provider surface stays as the primary way to
add a TTS backend.

The hook covers cases the shell-template grammar can't reasonably
express:

- Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
- Streaming synthesis (chunked Opus → voice-bubble delivery)
- Voice metadata API for the `hermes tools` picker
- OAuth-refreshing auth flows

None of the 10 inline built-in providers (`edge`, `openai`,
`elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`,
`kittentts`, `neutts`) are migrated to plugins. They stay inline. The
hook is for *new* engines that aren't built-in.

## Resolution order

The dispatcher's resolution order is the load-bearing invariant:

1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.**
2. `tts.provider` matches `tts.providers.<name>` with `command:` set
   → command-provider dispatch (PR #17843).
3. `tts.provider` matches a plugin-registered `TTSProvider`
   → plugin dispatch (new).
4. No match → falls through to Edge TTS default (legacy behavior).

Built-ins-always-win is enforced at THREE layers:
- Registry: `register_provider()` rejects shadowing names with a warning.
- Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in
  names defensively before consulting the registry.
- Picker: `_plugin_tts_providers()` filters built-in shadows out of
  the `hermes tools` row list defensively.

Command-providers-win-over-plugins is enforced at TWO layers:
- The caller in `text_to_speech_tool` checks
  `_resolve_command_provider_config` first.
- `_dispatch_to_plugin_provider` re-checks for a same-name command
  config defensively so a refactor of the caller can't silently break
  the invariant.

## New files

- `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required),
  `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`,
  `voice_compatible` (all optional with sane defaults). Mirrors
  `agent/image_gen_provider.py` shape.
- `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers`
  with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors
  `agent/image_gen_registry.py` shape.
- `plugins/tts/...` directory ready for community plugins (none shipped).

## Modified files

- `hermes_cli/plugins.py` — `register_tts_provider()` method on
  `PluginContext`. Matches the gating shape of
  `register_image_gen_provider()` / `register_browser_provider()`.
- `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` +
  `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into
  the main dispatcher. Built-in elif chain untouched.
- `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects
  plugin rows into the Text-to-Speech picker category alongside the
  10 hardcoded built-in rows.

## Tests

- `tests/agent/test_tts_registry.py` — 47 tests covering registration,
  lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression
  test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from
  `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to
  circular import constraints).
- `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering
  built-in-always-wins, command-wins-over-plugin, plugin dispatch,
  exception passthrough, voice_compatible helper.
- `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the
  picker surface, builtin shadowing defense, integration with
  `_visible_providers`.
- `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end
  tests via `PluginManager.discover_and_load()`.
- `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess
  parity harness vs `origin/main`. The only intentional diff is
  `fallback_edge → plugin` for the `plugin-installed` scenario.

## Verification

- 95/95 new tests pass.
- 170/170 pre-existing TTS tests (test_tts_command_providers,
  test_tts_max_text_length, test_tts_speed, etc.) pass unchanged.
- Parity harness against `origin/main`: 8 OK + 1 expected DIFF.
- E2E smoke: a registered plugin's `synthesize()` is called via
  `text_to_speech_tool` with the standard JSON envelope returned.
- Ruff clean on all touched files.

## Docs

- `website/docs/user-guide/features/tts.md` — new "Python plugin
  providers" section with a decision table (command-provider vs
  plugin), minimal plugin example, and the optional-hook reference.
- `website/docs/user-guide/features/plugins.md` — TTS row updated to
  mention both surfaces (command-provider primary, plugin for
  SDK/streaming).

Closes #30398
This commit is contained in:
kshitijk4poor 2026-05-22 17:58:07 +05:30 committed by Teknium
parent 782681f904
commit 00ec0b617c
13 changed files with 2037 additions and 1 deletions

274
agent/tts_provider.py Normal file
View file

@ -0,0 +1,274 @@
"""
Text-to-Speech Provider ABC
============================
Defines the pluggable-backend interface for text-to-speech synthesis.
Providers register instances via
``PluginContext.register_tts_provider()``; the active one (selected via
``tts.provider`` in ``config.yaml``) services every ``text_to_speech``
tool call **only when the configured name is neither a built-in nor a
command-type provider declared under ``tts.providers.<name>``**.
Three coexisting TTS extension surfaces in resolution order:
1. **Built-in providers** (``BUILTIN_TTS_PROVIDERS`` in
:mod:`tools.tts_tool`) native Python implementations (edge, openai,
elevenlabs, ). **Always win** plugins cannot shadow them.
2. **Command-type providers** declared under ``tts.providers.<name>:
type: command`` (PR #17843, commit ``2facea7f7``). Wire any local
CLI into Hermes with shell-template placeholders. **Wins over a
same-name plugin** config is more local than plugin install.
3. **Plugin-registered providers** (this ABC). For backends that need a
Python SDK, streaming bytes, OAuth refresh, or voice-listing APIs
the shell-template grammar can't reasonably express.
Built-ins-always-win is enforced at registration time
(:func:`agent.tts_registry.register_provider` rejects names in
``BUILTIN_TTS_PROVIDERS`` with a warning) AND at dispatch time
(:func:`tools.tts_tool._dispatch_to_plugin_provider` re-checks
defensively). The dispatcher also rejects plugin dispatch when a same-
name command provider is configured.
Providers live in ``<repo>/plugins/tts/<name>/`` (built-in plugins, no
shipped today) or ``~/.hermes/plugins/tts/<name>/`` (user-installed).
None ship in-tree as of issue #30398 — the hook is additive
infrastructure waiting for a real consumer (Cartesia, Fish Audio, ).
Response contract
-----------------
:meth:`TTSProvider.synthesize` writes the audio bytes to ``output_path``
and returns the path as a string. Implementations should raise on
failure the dispatcher converts exceptions into the standard
``{success: False, error: }`` JSON envelope the rest of Hermes
expects.
"""
from __future__ import annotations
import abc
import logging
from typing import Any, Dict, Iterator, List, Optional
logger = logging.getLogger(__name__)
DEFAULT_OUTPUT_FORMAT = "mp3"
VALID_OUTPUT_FORMATS = frozenset({"mp3", "wav", "ogg", "opus", "flac"})
# ---------------------------------------------------------------------------
# ABC
# ---------------------------------------------------------------------------
class TTSProvider(abc.ABC):
"""Abstract base class for a text-to-speech backend.
Subclasses must implement :attr:`name` and :meth:`synthesize`.
Everything else has sane defaults override only what your provider
needs.
"""
@property
@abc.abstractmethod
def name(self) -> str:
"""Stable short identifier used in ``tts.provider`` config.
Lowercase, no spaces. Examples: ``cartesia``, ``fishaudio``,
``deepgram``. Names that collide with a built-in TTS provider
(``edge``, ``openai``, ``elevenlabs``, ``minimax``, ``gemini``,
``mistral``, ``xai``, ``piper``, ``kittentts``, ``neutts``) are
rejected at registration time.
"""
@property
def display_name(self) -> str:
"""Human-readable label shown in ``hermes tools``.
Defaults to ``name.title()`` (e.g. ``Cartesia`` for ``cartesia``).
"""
return self.name.title()
def is_available(self) -> bool:
"""Return True when this provider can service calls.
Typically checks for a required API key + that the SDK is
importable. Default: True (providers with no external
dependencies are always available).
Must NOT raise used by the picker and ``hermes setup`` for
availability displays and should fail gracefully.
"""
return True
def list_voices(self) -> List[Dict[str, Any]]:
"""Return voice catalog entries.
Each entry::
{
"id": "voice-abc-123", # required
"display": "Aria — neutral female", # optional; defaults to id
"language": "en-US", # optional
"gender": "female", # optional
"preview_url": "https://...mp3", # optional
}
Default: empty list (provider has no enumerable voices or
doesn't surface them via API).
"""
return []
def list_models(self) -> List[Dict[str, Any]]:
"""Return model catalog entries.
Each entry::
{
"id": "sonic-2", # required
"display": "Sonic 2", # optional
"languages": ["en", "es", "fr"], # optional
"max_text_length": 5000, # optional
}
Default: empty list (provider has a single fixed model or
doesn't expose model selection).
"""
return []
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker.
Used by ``tools_config.py`` to inject this provider as a row in
the Text-to-Speech provider list. Shape::
{
"name": "Cartesia", # picker label
"badge": "paid", # optional short tag
"tag": "Ultra-low-latency streaming", # optional subtitle
"env_vars": [ # keys to prompt for
{"key": "CARTESIA_API_KEY",
"prompt": "Cartesia API key",
"url": "https://play.cartesia.ai/console"},
],
}
Default: minimal entry derived from ``display_name`` with no
env vars. Override to expose API key prompts and custom badges.
"""
return {
"name": self.display_name,
"badge": "",
"tag": "",
"env_vars": [],
}
def default_model(self) -> Optional[str]:
"""Return the default model id, or None if not applicable."""
models = self.list_models()
if models:
return models[0].get("id")
return None
def default_voice(self) -> Optional[str]:
"""Return the default voice id, or None if not applicable."""
voices = self.list_voices()
if voices:
return voices[0].get("id")
return None
@abc.abstractmethod
def synthesize(
self,
text: str,
output_path: str,
*,
voice: Optional[str] = None,
model: Optional[str] = None,
speed: Optional[float] = None,
format: str = DEFAULT_OUTPUT_FORMAT,
**extra: Any,
) -> str:
"""Synthesize ``text`` and write audio bytes to ``output_path``.
Returns the absolute path to the written file as a string
(typically just echoes ``output_path``). Raises on failure
the dispatcher converts exceptions to the standard
``{success: False, error: ...}`` JSON envelope.
Args:
text: The text to synthesize. Already truncated to the
provider's max length by the dispatcher.
output_path: Absolute path where the audio file should be
written. Parent directory is guaranteed to exist.
voice: Voice identifier from :meth:`list_voices`, or None
to use :meth:`default_voice`.
model: Model identifier from :meth:`list_models`, or None
to use :meth:`default_model`.
speed: Optional speech-rate multiplier (1.0 = normal).
Providers that don't support speed control should
ignore this argument.
format: Output audio format. Implementations should match
the requested format when possible; if unsupported,
pick the closest equivalent and ensure ``output_path``
ends with the correct extension.
**extra: Forward-compat parameters future schema versions
may expose. Implementations should ignore unknown keys.
"""
def stream(
self,
text: str,
*,
voice: Optional[str] = None,
model: Optional[str] = None,
format: str = "opus",
**extra: Any,
) -> Iterator[bytes]:
"""Stream synthesized audio bytes.
Optional. Providers that don't support streaming raise
:class:`NotImplementedError` (the default) and the dispatcher
falls back to :meth:`synthesize` + read-whole-file.
Args mirror :meth:`synthesize`. Default ``format`` is ``opus``
because the primary streaming use case is voice-bubble
delivery (Telegram et al.) which requires Opus.
"""
raise NotImplementedError(
f"TTS provider {self.name!r} does not implement streaming "
"synthesis. Use synthesize() instead, or implement stream() "
"if your backend supports it."
)
@property
def voice_compatible(self) -> bool:
"""Whether output is suitable for voice-bubble delivery.
Mirrors the ``tts.providers.<name>.voice_compatible`` field
from PR #17843. When True, the gateway's voice-message
delivery pipeline runs ffmpeg conversion to Opus if needed.
When False, output is delivered as a regular audio attachment.
Default: False (safe providers opt in explicitly).
"""
return False
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def resolve_output_format(value: Optional[str]) -> str:
"""Clamp an output_format value to the valid set.
Invalid values are coerced to :data:`DEFAULT_OUTPUT_FORMAT` rather
than rejected so the tool surface is forgiving of agent mistakes.
"""
if not isinstance(value, str):
return DEFAULT_OUTPUT_FORMAT
v = value.strip().lower()
if v in VALID_OUTPUT_FORMATS:
return v
return DEFAULT_OUTPUT_FORMAT

133
agent/tts_registry.py Normal file
View file

@ -0,0 +1,133 @@
"""
TTS Provider Registry
=====================
Central map of registered TTS providers. Populated by plugins at
import-time via :meth:`PluginContext.register_tts_provider`; consumed
by :mod:`tools.tts_tool` to dispatch ``text_to_speech`` tool calls to
the active plugin backend **when** the configured ``tts.provider``
name is neither a built-in nor a command-type provider.
Built-ins-always-win
--------------------
Plugin names that collide with a built-in TTS provider (``edge``,
``openai``, ``elevenlabs``, ``minimax``, ``gemini``, ``mistral``,
``xai``, ``piper``, ``kittentts``, ``neutts``) are rejected at
registration with a warning. This invariant is also re-checked at
dispatch time in :func:`tools.tts_tool._dispatch_to_plugin_provider`.
Command-providers-win-over-plugins
----------------------------------
This registry doesn't enforce the command-vs-plugin precedence — that
lives in the dispatcher, which checks for a same-name
``tts.providers.<name>: type: command`` entry before consulting the
registry. The rationale is locality: a name declared in the user's
``config.yaml`` is more specific to their setup than a plugin that
happens to be installed.
"""
from __future__ import annotations
import logging
import threading
from typing import Dict, List, Optional
from agent.tts_provider import TTSProvider
logger = logging.getLogger(__name__)
# Names reserved for native built-in TTS handlers. Plugins cannot
# register a name in this set — the registration call is rejected with
# a warning. **Kept in sync with ``BUILTIN_TTS_PROVIDERS`` in
# :mod:`tools.tts_tool`** — a regression test in
# ``tests/agent/test_tts_registry.py::TestBuiltinSync`` fails if the
# two lists drift. Importing from ``tools.tts_tool`` directly would
# create a circular dependency (``tools.tts_tool`` imports
# ``agent.tts_registry`` for dispatch).
_BUILTIN_NAMES = frozenset({
"edge",
"elevenlabs",
"openai",
"minimax",
"xai",
"mistral",
"gemini",
"neutts",
"kittentts",
"piper",
})
_providers: Dict[str, TTSProvider] = {}
_lock = threading.Lock()
def register_provider(provider: TTSProvider) -> None:
"""Register a TTS provider.
Rejects:
- Non-:class:`TTSProvider` instances (raises :class:`TypeError`).
- Empty/whitespace ``.name`` (raises :class:`ValueError`).
- Names colliding with a built-in (logs a warning, silently
ignores built-ins-always-win invariant).
Re-registration (same ``name``) overwrites the previous entry and
logs a debug message makes hot-reload scenarios (tests, dev
loops) behave predictably.
"""
if not isinstance(provider, TTSProvider):
raise TypeError(
f"register_provider() expects a TTSProvider instance, "
f"got {type(provider).__name__}"
)
name = provider.name
if not isinstance(name, str) or not name.strip():
raise ValueError("TTS provider .name must be a non-empty string")
key = name.strip().lower()
if key in _BUILTIN_NAMES:
logger.warning(
"TTS provider '%s' shadows a built-in name; registration ignored. "
"Built-in TTS providers (%s) always win — pick a different name.",
key, ", ".join(sorted(_BUILTIN_NAMES)),
)
return
with _lock:
existing = _providers.get(key)
_providers[key] = provider
if existing is not None:
logger.debug(
"TTS provider '%s' re-registered (was %r)",
key, type(existing).__name__,
)
else:
logger.debug(
"Registered TTS provider '%s' (%s)",
key, type(provider).__name__,
)
def list_providers() -> List[TTSProvider]:
"""Return all registered providers, sorted by name."""
with _lock:
items = list(_providers.values())
return sorted(items, key=lambda p: p.name)
def get_provider(name: str) -> Optional[TTSProvider]:
"""Return the provider registered under *name*, or None.
Name matching is case-insensitive and whitespace-tolerant mirrors
how ``tools.tts_tool._get_provider`` normalizes the configured
``tts.provider`` value.
"""
if not isinstance(name, str):
return None
return _providers.get(name.strip().lower())
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
_providers.clear()

View file

@ -640,6 +640,44 @@ class PluginContext:
self.manifest.name, provider.name,
)
# -- TTS provider registration -------------------------------------------
def register_tts_provider(self, provider) -> None:
"""Register a text-to-speech backend.
``provider`` must be an instance of
:class:`agent.tts_provider.TTSProvider`. The ``provider.name``
attribute is what ``tts.provider`` in ``config.yaml`` matches
against when routing ``text_to_speech`` tool calls **but
only when**:
1. ``provider.name`` is NOT a built-in TTS provider name
(``edge``, ``openai``, ``elevenlabs``, ). Built-ins always
win the registry rejects shadowing names with a warning.
2. There is NO ``tts.providers.<name>: type: command`` entry
with the same name. Command-providers (PR #17843) win on
name collision because config is more local than plugin
install.
Coexists with the command-provider registry rather than
replacing it see issue #30398 for the full design rationale.
"""
from agent.tts_provider import TTSProvider
from agent.tts_registry import register_provider as _register_tts_provider
if not isinstance(provider, TTSProvider):
logger.warning(
"Plugin '%s' tried to register a TTS provider that does "
"not inherit from TTSProvider. Ignoring.",
self.manifest.name,
)
return
_register_tts_provider(provider)
logger.info(
"Plugin '%s' registered TTS provider: %s",
self.manifest.name, provider.name,
)
# -- platform adapter registration ---------------------------------------
def register_platform(

View file

@ -1753,6 +1753,62 @@ def _plugin_browser_providers() -> list[dict]:
return rows
def _plugin_tts_providers() -> list[dict]:
"""Build picker-row dicts from plugin-registered TTS providers.
Issue #30398 — the ``register_tts_provider()`` plugin hook
coexists alongside the 10 built-in TTS providers
(``edge``/``openai``/``elevenlabs``/) and the
``tts.providers.<name>: type: command`` registry from PR #17843.
Built-in rows stay hardcoded in ``TOOL_CATEGORIES["tts"]``; this
function only injects PLUGIN-registered providers.
Defensive: plugins whose name collides with a built-in TTS provider
are filtered out even though the registry already rejects them
at registration time, a future code path that registers directly
via :func:`agent.tts_registry.register_provider` could slip
through. Filtering here keeps the picker invariant.
"""
try:
from agent.tts_registry import _BUILTIN_NAMES, list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
providers = list_providers()
except Exception:
return []
rows: list[dict] = []
for provider in providers:
name = getattr(provider, "name", None)
if not name:
continue
# Defensive: reject built-in shadowing at the picker layer too.
if name.lower().strip() in _BUILTIN_NAMES:
continue
try:
schema = provider.get_setup_schema()
except Exception:
continue
if not isinstance(schema, dict):
continue
row = {
"name": schema.get("name", provider.display_name),
"badge": schema.get("badge", ""),
"tag": schema.get("tag", ""),
"env_vars": schema.get("env_vars", []),
# Selecting this row writes ``tts.provider: <name>`` — the
# same write-path used by hardcoded rows. The plugin
# dispatcher picks it up automatically from there.
"tts_provider": name,
"tts_plugin_name": name,
}
if schema.get("post_setup"):
row["post_setup"] = schema["post_setup"]
rows.append(row)
return rows
def _visible_providers(cat: dict, config: dict) -> list[dict]:
"""Return provider entries visible for the current auth/config state."""
features = get_nous_subscription_features(config)
@ -1790,6 +1846,12 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
if cat.get("name") == "Browser Automation":
visible.extend(_plugin_browser_providers())
# Inject plugin-registered TTS backends (issue #30398). Plugin rows
# render BELOW the 10 hardcoded built-in rows. Built-in shadowing
# is filtered out by ``_plugin_tts_providers`` defensively.
if cat.get("name") == "Text-to-Speech":
visible.extend(_plugin_tts_providers())
return visible

View file

@ -0,0 +1,312 @@
"""Tests for agent/tts_registry.py and agent/tts_provider.py.
Covers:
- Registration happy path
- Registration rejection: non-TTSProvider type
- Registration rejection: empty/whitespace name
- Built-in name shadowing: warning + silent ignore (no exception)
- Re-registration: overwrites + logs at debug
- Case + whitespace insensitivity on lookup
- ABC contract: default implementations work
- ABC contract: synthesize() must be implemented
- ABC contract: stream() raises NotImplementedError by default
- resolve_output_format helper coerces invalid input
"""
from __future__ import annotations
import logging
from typing import Any, Optional
import pytest
from agent import tts_registry
from agent.tts_provider import (
DEFAULT_OUTPUT_FORMAT,
VALID_OUTPUT_FORMATS,
TTSProvider,
resolve_output_format,
)
class _FakeProvider(TTSProvider):
def __init__(
self,
name: str = "fake",
display: Optional[str] = None,
voice_compat: bool = False,
synthesize_impl: Optional[Any] = None,
):
self._name = name
self._display = display
self._voice_compat = voice_compat
self._synthesize_impl = synthesize_impl
@property
def name(self) -> str:
return self._name
@property
def display_name(self) -> str:
return self._display if self._display is not None else super().display_name
@property
def voice_compatible(self) -> bool:
return self._voice_compat
def synthesize(self, text: str, output_path: str, **kw):
if self._synthesize_impl is not None:
return self._synthesize_impl(text, output_path, **kw)
return output_path
@pytest.fixture(autouse=True)
def _reset_registry():
tts_registry._reset_for_tests()
yield
tts_registry._reset_for_tests()
# ---------------------------------------------------------------------------
# Registration
# ---------------------------------------------------------------------------
class TestRegistration:
def test_happy_path(self):
p = _FakeProvider(name="cartesia")
tts_registry.register_provider(p)
assert tts_registry.get_provider("cartesia") is p
assert [r.name for r in tts_registry.list_providers()] == ["cartesia"]
def test_rejects_non_provider_type(self):
with pytest.raises(TypeError, match="expects a TTSProvider instance"):
tts_registry.register_provider("not a provider") # type: ignore[arg-type]
assert tts_registry.list_providers() == []
def test_rejects_empty_name(self):
p = _FakeProvider(name="")
with pytest.raises(ValueError, match="non-empty string"):
tts_registry.register_provider(p)
assert tts_registry.list_providers() == []
def test_rejects_whitespace_name(self):
p = _FakeProvider(name=" ")
with pytest.raises(ValueError, match="non-empty string"):
tts_registry.register_provider(p)
assert tts_registry.list_providers() == []
@pytest.mark.parametrize(
"builtin",
["edge", "openai", "elevenlabs", "minimax", "gemini",
"mistral", "xai", "piper", "kittentts", "neutts"],
)
def test_rejects_builtin_shadow_with_warning(self, builtin, caplog):
"""Built-in names always win — plugin registration is silently ignored
but a warning is logged so the operator can see what happened.
"""
p = _FakeProvider(name=builtin)
with caplog.at_level(logging.WARNING, logger="agent.tts_registry"):
tts_registry.register_provider(p)
assert "shadows a built-in name" in caplog.text
assert builtin in caplog.text
assert tts_registry.get_provider(builtin) is None
assert tts_registry.list_providers() == []
def test_builtin_shadow_case_insensitive(self, caplog):
"""``EDGE``/``Edge``/`` edge `` all collide with the ``edge`` built-in."""
for variant in ("EDGE", "Edge", " edge ", "eDgE"):
tts_registry._reset_for_tests()
with caplog.at_level(logging.WARNING, logger="agent.tts_registry"):
tts_registry.register_provider(_FakeProvider(name=variant))
assert tts_registry.list_providers() == [], (
f"variant {variant!r} should have been rejected as a built-in shadow"
)
def test_reregistration_overwrites(self, caplog):
p1 = _FakeProvider(name="cartesia")
p2 = _FakeProvider(name="cartesia")
tts_registry.register_provider(p1)
with caplog.at_level(logging.DEBUG, logger="agent.tts_registry"):
tts_registry.register_provider(p2)
assert tts_registry.get_provider("cartesia") is p2
assert "re-registered" in caplog.text
# ---------------------------------------------------------------------------
# Lookup
# ---------------------------------------------------------------------------
class TestLookup:
def test_get_provider_missing_returns_none(self):
assert tts_registry.get_provider("nonexistent") is None
def test_get_provider_non_string_returns_none(self):
assert tts_registry.get_provider(None) is None # type: ignore[arg-type]
assert tts_registry.get_provider(123) is None # type: ignore[arg-type]
def test_get_provider_case_insensitive(self):
p = _FakeProvider(name="cartesia")
tts_registry.register_provider(p)
assert tts_registry.get_provider("CARTESIA") is p
assert tts_registry.get_provider("Cartesia") is p
def test_get_provider_whitespace_tolerant(self):
p = _FakeProvider(name="cartesia")
tts_registry.register_provider(p)
assert tts_registry.get_provider(" cartesia ") is p
def test_list_providers_sorted(self):
tts_registry.register_provider(_FakeProvider(name="zylo"))
tts_registry.register_provider(_FakeProvider(name="alpha"))
tts_registry.register_provider(_FakeProvider(name="middle"))
names = [p.name for p in tts_registry.list_providers()]
assert names == ["alpha", "middle", "zylo"]
# ---------------------------------------------------------------------------
# ABC contract
# ---------------------------------------------------------------------------
class TestABCContract:
def test_must_implement_synthesize(self):
class Incomplete(TTSProvider):
@property
def name(self) -> str:
return "incomplete"
# synthesize NOT implemented
with pytest.raises(TypeError, match="abstract"):
Incomplete() # type: ignore[abstract]
def test_must_implement_name(self):
class Incomplete(TTSProvider):
def synthesize(self, text, output_path, **kw):
return output_path
# name NOT implemented
with pytest.raises(TypeError, match="abstract"):
Incomplete() # type: ignore[abstract]
def test_display_name_defaults_to_title(self):
p = _FakeProvider(name="cartesia")
assert p.display_name == "Cartesia"
def test_display_name_override_respected(self):
p = _FakeProvider(name="cartesia", display="Cartesia AI")
assert p.display_name == "Cartesia AI"
def test_is_available_default_true(self):
p = _FakeProvider(name="cartesia")
assert p.is_available() is True
def test_list_voices_default_empty(self):
p = _FakeProvider(name="cartesia")
assert p.list_voices() == []
def test_list_models_default_empty(self):
p = _FakeProvider(name="cartesia")
assert p.list_models() == []
def test_default_model_none_when_no_models(self):
p = _FakeProvider(name="cartesia")
assert p.default_model() is None
def test_default_voice_none_when_no_voices(self):
p = _FakeProvider(name="cartesia")
assert p.default_voice() is None
def test_default_model_first_listed(self):
class WithModels(_FakeProvider):
def list_models(self):
return [{"id": "sonic-2"}, {"id": "sonic-1"}]
p = WithModels(name="cartesia")
assert p.default_model() == "sonic-2"
def test_default_voice_first_listed(self):
class WithVoices(_FakeProvider):
def list_voices(self):
return [{"id": "voice-aria"}, {"id": "voice-jasper"}]
p = WithVoices(name="cartesia")
assert p.default_voice() == "voice-aria"
def test_get_setup_schema_default_minimal(self):
p = _FakeProvider(name="cartesia")
schema = p.get_setup_schema()
assert schema["name"] == "Cartesia"
assert schema["env_vars"] == []
def test_stream_raises_not_implemented_by_default(self):
p = _FakeProvider(name="cartesia")
with pytest.raises(NotImplementedError, match="does not implement streaming"):
next(p.stream("hello"))
def test_voice_compatible_default_false(self):
p = _FakeProvider(name="cartesia")
assert p.voice_compatible is False
def test_voice_compatible_override(self):
p = _FakeProvider(name="cartesia", voice_compat=True)
assert p.voice_compatible is True
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
class TestResolveOutputFormat:
@pytest.mark.parametrize("valid", sorted(VALID_OUTPUT_FORMATS))
def test_valid_passes_through(self, valid):
assert resolve_output_format(valid) == valid
def test_uppercase_normalized(self):
assert resolve_output_format("MP3") == "mp3"
assert resolve_output_format("Opus") == "opus"
def test_whitespace_stripped(self):
assert resolve_output_format(" wav ") == "wav"
def test_invalid_returns_default(self):
assert resolve_output_format("aiff") == DEFAULT_OUTPUT_FORMAT
assert resolve_output_format("") == DEFAULT_OUTPUT_FORMAT
def test_none_returns_default(self):
assert resolve_output_format(None) == DEFAULT_OUTPUT_FORMAT
def test_non_string_returns_default(self):
assert resolve_output_format(123) == DEFAULT_OUTPUT_FORMAT # type: ignore[arg-type]
assert resolve_output_format([]) == DEFAULT_OUTPUT_FORMAT # type: ignore[arg-type]
# ---------------------------------------------------------------------------
# Sync invariant: registry's built-in list vs dispatcher's built-in list
# ---------------------------------------------------------------------------
class TestBuiltinSync:
"""``_BUILTIN_NAMES`` in agent/tts_registry.py is duplicated from
``BUILTIN_TTS_PROVIDERS`` in tools/tts_tool.py (importing directly
would create a circular dependency). This test fails loudly if the
two lists drift a new built-in added to tts_tool.py MUST also be
added to tts_registry.py's _BUILTIN_NAMES or the registry will
accept a name the dispatcher will silently route to the wrong
handler.
"""
def test_registry_builtins_match_dispatcher_builtins(self):
from tools.tts_tool import BUILTIN_TTS_PROVIDERS
assert tts_registry._BUILTIN_NAMES == BUILTIN_TTS_PROVIDERS, (
"agent.tts_registry._BUILTIN_NAMES and "
"tools.tts_tool.BUILTIN_TTS_PROVIDERS have drifted!\n"
f" Registry only: {sorted(tts_registry._BUILTIN_NAMES - BUILTIN_TTS_PROVIDERS)}\n"
f" Dispatcher only: {sorted(BUILTIN_TTS_PROVIDERS - tts_registry._BUILTIN_NAMES)}\n"
"Add the missing names to whichever list is incomplete. "
"These two lists exist as a circular-import workaround and "
"MUST be kept in sync manually."
)

View file

@ -0,0 +1,156 @@
"""Tests for PluginContext.register_tts_provider() (issue #30398).
Exercises the plugin context hook end-to-end: drops a fake plugin into
``$HERMES_HOME/plugins/``, runs ``PluginManager().discover_and_load()``,
and asserts the registration result.
Mirrors the structure of
``tests/hermes_cli/test_plugin_scanner_recursion.py::TestRegisterImageGenProvider``.
"""
from __future__ import annotations
import os
from pathlib import Path
from typing import Any, Dict
import yaml
def _write_plugin(
root: Path,
name: str,
*,
manifest_extra: Dict[str, Any] | None = None,
register_body: str = "pass",
) -> Path:
plugin_dir = root / name
plugin_dir.mkdir(parents=True, exist_ok=True)
manifest = {
"name": name,
"version": "0.1.0",
"description": f"Test plugin {name}",
}
if manifest_extra:
manifest.update(manifest_extra)
(plugin_dir / "plugin.yaml").write_text(yaml.dump(manifest))
(plugin_dir / "__init__.py").write_text(
f"def register(ctx):\n {register_body}\n"
)
return plugin_dir
def _enable(hermes_home: Path, name: str) -> None:
cfg_path = hermes_home / "config.yaml"
cfg: dict = {}
if cfg_path.exists():
try:
cfg = yaml.safe_load(cfg_path.read_text()) or {}
except Exception:
cfg = {}
plugins_cfg = cfg.setdefault("plugins", {})
enabled = plugins_cfg.setdefault("enabled", [])
if isinstance(enabled, list) and name not in enabled:
enabled.append(name)
cfg_path.write_text(yaml.safe_dump(cfg))
class TestRegisterTTSProvider:
"""End-to-end: a fake plugin registers via the hook, ends up in the registry."""
def test_accepts_valid_provider(self):
from hermes_cli.plugins import PluginManager
from agent import tts_registry
tts_registry._reset_for_tests()
hermes_home = Path(os.environ["HERMES_HOME"])
_write_plugin(
hermes_home / "plugins",
"my-tts-plugin",
register_body=(
"from agent.tts_provider import TTSProvider\n"
" class P(TTSProvider):\n"
" @property\n"
" def name(self): return 'fake-tts'\n"
" def synthesize(self, text, output_path, **kw):\n"
" return output_path\n"
" ctx.register_tts_provider(P())"
),
)
_enable(hermes_home, "my-tts-plugin")
mgr = PluginManager()
mgr.discover_and_load()
assert mgr._plugins["my-tts-plugin"].enabled is True, (
f"Plugin failed to load: {mgr._plugins['my-tts-plugin'].error}"
)
assert tts_registry.get_provider("fake-tts") is not None
tts_registry._reset_for_tests()
def test_rejects_non_provider(self, caplog):
"""A plugin that passes a non-TTSProvider gets a warning, no exception."""
from hermes_cli.plugins import PluginManager
from agent import tts_registry
tts_registry._reset_for_tests()
hermes_home = Path(os.environ["HERMES_HOME"])
_write_plugin(
hermes_home / "plugins",
"bad-tts-plugin",
register_body="ctx.register_tts_provider('not a provider')",
)
_enable(hermes_home, "bad-tts-plugin")
with caplog.at_level("WARNING"):
mgr = PluginManager()
mgr.discover_and_load()
# Plugin loaded (register returned normally), but registry empty.
assert mgr._plugins["bad-tts-plugin"].enabled is True
assert tts_registry.get_provider("not a provider") is None
assert tts_registry.list_providers() == []
assert "does not inherit from TTSProvider" in caplog.text
tts_registry._reset_for_tests()
def test_rejects_builtin_shadow(self, caplog):
"""A plugin trying to register a name colliding with a built-in is silently
rejected by the underlying registry both with a registry-level warning
AND with the registry remaining empty (plugin still loads OK).
"""
from hermes_cli.plugins import PluginManager
from agent import tts_registry
tts_registry._reset_for_tests()
hermes_home = Path(os.environ["HERMES_HOME"])
_write_plugin(
hermes_home / "plugins",
"shadow-tts-plugin",
register_body=(
"from agent.tts_provider import TTSProvider\n"
" class P(TTSProvider):\n"
" @property\n"
" def name(self): return 'edge'\n"
" def synthesize(self, text, output_path, **kw):\n"
" return output_path\n"
" ctx.register_tts_provider(P())"
),
)
_enable(hermes_home, "shadow-tts-plugin")
with caplog.at_level("WARNING"):
mgr = PluginManager()
mgr.discover_and_load()
# Plugin still loaded normally — built-in shadowing is a warning,
# not an exception. The registry rejects the entry though.
assert mgr._plugins["shadow-tts-plugin"].enabled is True
assert tts_registry.get_provider("edge") is None
assert "shadows a built-in name" in caplog.text
tts_registry._reset_for_tests()

View file

@ -0,0 +1,187 @@
"""Tests for the TTS plugin picker surface in hermes_cli/tools_config.py (issue #30398).
Covers ``_plugin_tts_providers()`` and the ``_visible_providers()``
integration that injects plugin rows into the Text-to-Speech category.
Mirrors the structure of existing image_gen / browser picker tests.
"""
from __future__ import annotations
import pytest
from agent import tts_registry
from agent.tts_provider import TTSProvider
from hermes_cli import tools_config
class _FakeTTSProvider(TTSProvider):
def __init__(self, name: str, schema: dict | None = None):
self._name = name
self._schema = schema
@property
def name(self) -> str:
return self._name
def synthesize(self, text, output_path, **kw):
return output_path
def get_setup_schema(self):
if self._schema is not None:
return self._schema
return super().get_setup_schema()
@pytest.fixture(autouse=True)
def _reset_registry():
tts_registry._reset_for_tests()
yield
tts_registry._reset_for_tests()
class TestPluginTTSProviders:
"""``_plugin_tts_providers()`` returns picker-row dicts."""
def test_empty_when_no_plugins(self):
assert tools_config._plugin_tts_providers() == []
def test_returns_row_for_registered_plugin(self):
tts_registry.register_provider(
_FakeTTSProvider(
name="cartesia",
schema={
"name": "Cartesia",
"badge": "paid",
"tag": "Ultra-low-latency streaming",
"env_vars": [
{"key": "CARTESIA_API_KEY", "prompt": "Cartesia API key",
"url": "https://play.cartesia.ai/console"},
],
},
)
)
rows = tools_config._plugin_tts_providers()
assert len(rows) == 1
row = rows[0]
assert row["name"] == "Cartesia"
assert row["badge"] == "paid"
assert row["tag"] == "Ultra-low-latency streaming"
assert row["env_vars"][0]["key"] == "CARTESIA_API_KEY"
# Selecting this row writes ``tts.provider: cartesia`` — same
# write path as a hardcoded row.
assert row["tts_provider"] == "cartesia"
assert row["tts_plugin_name"] == "cartesia"
def test_filters_builtin_shadow_defensively(self):
"""Even if a plugin slipped past the registry's built-in check
(e.g. via direct ``agent.tts_registry.register_provider`` rather
than the ``ctx.register_tts_provider`` hook), the picker layer
filters it out so the picker invariant holds."""
# Use lower-level call to bypass the warning + skip in
# register_provider (the registry's built-in guard).
# Note: this is intentionally pathological — production code
# paths go through the hook which catches this first.
provider = _FakeTTSProvider(name="edge")
tts_registry._providers["edge"] = provider # type: ignore[index]
try:
rows = tools_config._plugin_tts_providers()
assert rows == [], (
"Picker must filter built-in name shadows even when the "
"registry has been bypassed."
)
finally:
tts_registry._providers.pop("edge", None) # type: ignore[arg-type]
def test_skips_providers_with_no_name(self):
"""Defense in depth: a provider with no .name attribute is skipped
rather than crashing the picker."""
class _NoName:
display_name = "Bogus"
def get_setup_schema(self):
return {"name": "Bogus"}
tts_registry._providers["bogus"] = _NoName() # type: ignore[assignment]
try:
rows = tools_config._plugin_tts_providers()
# Provider has no .name so the picker filters it out
assert all(r.get("tts_plugin_name") != "bogus" for r in rows)
finally:
tts_registry._providers.pop("bogus", None) # type: ignore[arg-type]
def test_skips_providers_whose_schema_raises(self):
class _ExplodingSchema(_FakeTTSProvider):
def get_setup_schema(self):
raise RuntimeError("boom")
tts_registry.register_provider(_ExplodingSchema(name="exploding"))
tts_registry.register_provider(_FakeTTSProvider(name="working"))
rows = tools_config._plugin_tts_providers()
assert [r["tts_plugin_name"] for r in rows] == ["working"]
def test_minimal_schema_uses_display_name(self):
"""A provider with no setup_schema override gets a row built from
``display_name`` and ``name`` only."""
tts_registry.register_provider(_FakeTTSProvider(name="minimal"))
rows = tools_config._plugin_tts_providers()
assert len(rows) == 1
assert rows[0]["name"] == "Minimal" # display_name default
assert rows[0]["tts_provider"] == "minimal"
assert rows[0]["env_vars"] == []
def test_post_setup_passthrough(self):
tts_registry.register_provider(
_FakeTTSProvider(
name="my-tts",
schema={
"name": "My TTS",
"post_setup": "my_post_install_hook",
"env_vars": [],
},
)
)
rows = tools_config._plugin_tts_providers()
assert rows[0].get("post_setup") == "my_post_install_hook"
class TestVisibleProvidersInjectsTTSPlugins:
"""``_visible_providers()`` injects plugin rows into the Text-to-Speech
category alongside the hardcoded built-in rows."""
def test_tts_category_includes_plugin_rows(self):
tts_registry.register_provider(_FakeTTSProvider(name="cartesia"))
tts_cat = tools_config.TOOL_CATEGORIES["tts"]
visible = tools_config._visible_providers(tts_cat, config={})
names = [row.get("name") for row in visible]
# Hardcoded rows (sample — check at least one is present)
assert "Microsoft Edge TTS" in names
# Plugin row injected at the end
assert "Cartesia" in names
# Plugin row has tts_provider key for write-path compat
plugin_rows = [r for r in visible if r.get("tts_plugin_name")]
assert len(plugin_rows) == 1
assert plugin_rows[0]["tts_provider"] == "cartesia"
def test_other_categories_unaffected_by_tts_plugins(self):
"""Registering a TTS plugin must not leak into the Image Generation
or Browser pickers."""
tts_registry.register_provider(_FakeTTSProvider(name="cartesia"))
img_cat = tools_config.TOOL_CATEGORIES["image_gen"]
visible = tools_config._visible_providers(img_cat, config={})
names = [row.get("name") for row in visible]
assert "Cartesia" not in names
def test_tts_category_without_plugins_only_hardcoded(self):
"""No plugins → picker shows exactly the hardcoded rows."""
tts_cat = tools_config.TOOL_CATEGORIES["tts"]
visible = tools_config._visible_providers(tts_cat, config={})
names = [row.get("name") for row in visible]
# No row has the plugin marker
assert all(not row.get("tts_plugin_name") for row in visible)
# Hardcoded rows still present (sample one of the always-visible ones)
assert "Microsoft Edge TTS" in names

View file

View file

@ -0,0 +1,328 @@
"""Behavior-parity check for the TTS plugin hook (issue #30398).
Spawns one subprocess per (version, scenario) cell pinned to either
``origin/main`` (no plugin hook; ``tts.provider: cartesia`` falls
through to the Edge TTS default branch) or this PR's worktree (plugin
hook present; same config routes through the plugin registry when a
plugin is registered).
Each subprocess clears all TTS-related env vars + writes a
``config.yaml``, then resolves how the dispatcher would route a
``text_to_speech`` call. The emitted shape tuple is::
{dispatch_kind, provider_name, voice_compat}
Where ``dispatch_kind``
``{"builtin_edge", "builtin_openai", "builtin_elevenlabs", ...,
"command", "plugin", "fallback_edge", "error"}``:
* ``builtin_<name>`` config selects a built-in handler that exists
on both main and PR (no diff expected)
* ``command`` config selects a ``tts.providers.<name>: type: command``
entry (PR #17843; no diff expected)
* ``plugin`` config selects a plugin-registered provider (PR only)
* ``fallback_edge`` config selects an unknown name with no matching
plugin or command entry Edge TTS default fallback
* ``error`` explicit fatal error (e.g. mistral quarantine)
The parent process diffs the reduced shape per scenario. The only
acceptable diff is ``fallback_edge plugin`` for the
``unknown-name-with-plugin-installed`` scenario everything else is
a regression.
Run from the PR worktree (it auto-resolves ``MAIN_DIR`` from the parent
of the worktree directory, or falls back to a sibling
``hermes-agent-main`` checkout)::
python tests/plugins/tts/check_parity_vs_main.py
"""
from __future__ import annotations
import json
import subprocess
import sys
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parents[3]
def _resolve_main_dir() -> Path:
candidate = REPO_ROOT.parent.parent
if (candidate / "tools" / "tts_tool.py").exists() and candidate != REPO_ROOT:
return candidate
sibling = REPO_ROOT.parent / "hermes-agent-main"
if (sibling / "tools" / "tts_tool.py").exists():
return sibling
return REPO_ROOT
MAIN_DIR = _resolve_main_dir()
PR_DIR = REPO_ROOT
assert (PR_DIR / "tools" / "tts_tool.py").exists(), (
f"PR_DIR={PR_DIR} doesn't look like a hermes-agent checkout"
)
# The subprocess script — runs INSIDE either the main checkout or PR
# checkout, so the import paths resolve to the version of the code
# under test. We never call the real ``text_to_speech_tool`` because
# that would require audio synthesis; instead we ask the resolution
# layer what it WOULD do.
SUBPROCESS_SCRIPT = r"""
import json, os, sys, tempfile
sys.path.insert(0, sys.argv[1])
# Isolated HERMES_HOME so the config write is hermetic.
home = tempfile.mkdtemp()
os.environ["HERMES_HOME"] = home
# Clear TTS-related env so dispatch decisions are config-driven.
for k in (
"ELEVENLABS_API_KEY", "OPENAI_API_KEY", "VOICE_TOOLS_OPENAI_KEY",
"MINIMAX_API_KEY", "XAI_API_KEY", "GEMINI_API_KEY",
):
os.environ.pop(k, None)
scenario_env = json.loads(sys.argv[2])
os.environ.update(scenario_env)
config_yaml = sys.argv[3]
plugin_register = sys.argv[4] # "yes" to register a fake plugin
config_path = os.path.join(home, "config.yaml")
with open(config_path, "w") as f:
f.write(config_yaml)
# Fresh import — must not have anything cached from prior runs.
for name in list(sys.modules):
if (name.startswith("tools.")
or name.startswith("agent.")
or name.startswith("plugins.")
or name.startswith("hermes_cli.")):
sys.modules.pop(name, None)
# Try importing tts_registry — only exists on PR side.
have_plugin_hook = False
try:
from agent import tts_registry
from agent.tts_provider import TTSProvider
have_plugin_hook = True
if plugin_register == "yes":
class _FakeProvider(TTSProvider):
@property
def name(self): return "cartesia"
def synthesize(self, text, output_path, **kw):
return output_path
tts_registry._reset_for_tests()
tts_registry.register_provider(_FakeProvider())
except ImportError:
pass
import tools.tts_tool as tts_tool
# Read the config the same way text_to_speech_tool() does.
tts_config = tts_tool._load_tts_config()
provider = tts_tool._get_provider(tts_config)
dispatch_kind = None
provider_name = provider
voice_compat = False
error_text = None
try:
# Mistral is the one branch that returns a fatal error.
if provider == "mistral":
dispatch_kind = "error"
error_text = "mistral quarantine"
elif tts_tool._resolve_command_provider_config(provider, tts_config) is not None:
dispatch_kind = "command"
elif have_plugin_hook and provider not in tts_tool.BUILTIN_TTS_PROVIDERS:
# On PR side: check plugin dispatch.
plugin_path = tts_tool._dispatch_to_plugin_provider(
"test", os.path.join(home, "out.mp3"), provider, tts_config,
)
if plugin_path is not None:
dispatch_kind = "plugin"
voice_compat = tts_tool._plugin_provider_is_voice_compatible(provider)
else:
# Falls through to Edge TTS default on the PR side too.
dispatch_kind = "fallback_edge"
elif provider in tts_tool.BUILTIN_TTS_PROVIDERS:
dispatch_kind = "builtin_" + provider
else:
# On main side: unknown names fall through to Edge default.
dispatch_kind = "fallback_edge"
except Exception as exc:
dispatch_kind = "exception"
error_text = repr(exc)
shape = {
"dispatch_kind": dispatch_kind,
"provider_name": provider_name,
"voice_compat": bool(voice_compat),
"error_present": error_text is not None,
}
print(json.dumps(shape))
"""
SCENARIOS: list[tuple[str, str, dict[str, str], str]] = [
# (label, config.yaml body, scenario_env, plugin_register)
# Scenario 1: unset tts.provider → both: Edge default
("unset-defaults-to-edge", "", {}, "no"),
# Scenario 2: built-in name → both: that built-in
("explicit-edge", "tts:\n provider: edge\n", {}, "no"),
("explicit-openai", "tts:\n provider: openai\n", {}, "no"),
("explicit-elevenlabs", "tts:\n provider: elevenlabs\n", {}, "no"),
# Scenario 3: command-type provider → both: command dispatch
(
"command-provider",
"tts:\n provider: my-piper\n providers:\n my-piper:\n type: command\n command: 'piper -m model.onnx -f {output_path} < {input_path}'\n",
{},
"no",
),
# Scenario 4: unknown name with NO plugin installed → both: fallback to Edge
("unknown-no-plugin", "tts:\n provider: cartesia\n", {}, "no"),
# Scenario 5: unknown name WITH plugin installed
# main: fallback_edge (no plugin hook exists)
# PR: plugin (cartesia)
# This is the ONLY acceptable diff in the harness.
("plugin-installed", "tts:\n provider: cartesia\n", {}, "yes"),
# Scenario 6: built-in name + plugin tries to shadow → both: built-in
# The plugin registers under name "cartesia", not "edge", so this is
# effectively the same as scenario 2 — but we exercise the with-plugin
# path to ensure the built-in branch still takes priority.
("explicit-edge-with-plugin-registered", "tts:\n provider: edge\n", {}, "yes"),
# Scenario 7: mistral quarantine — both surface the explicit error
("mistral-quarantine", "tts:\n provider: mistral\n", {}, "no"),
]
def _run_scenario(repo_path: Path, label: str, config_yaml: str, env: dict, plugin_register: str) -> dict:
venv_python = repo_path / ".venv" / "bin" / "python"
if not venv_python.exists():
venv_python = MAIN_DIR / ".venv" / "bin" / "python"
if not venv_python.exists():
venv_python = MAIN_DIR / "venv" / "bin" / "python"
if not venv_python.exists():
venv_python = Path("python3")
out = subprocess.run(
[
str(venv_python),
"-c",
SUBPROCESS_SCRIPT,
str(repo_path),
json.dumps(env),
config_yaml,
plugin_register,
],
capture_output=True,
text=True,
timeout=60,
)
if out.returncode != 0:
return {
"error": "subprocess failed",
"stdout": out.stdout[-500:],
"stderr": out.stderr[-500:],
}
try:
return json.loads(out.stdout.strip().splitlines()[-1])
except Exception as exc:
return {"error": f"could not parse output: {exc}", "stdout": out.stdout}
def _reduce(shape: dict) -> dict:
"""Reduce to the parts that matter for user-visible parity."""
return {
"dispatch_kind": shape.get("dispatch_kind"),
"provider_name": shape.get("provider_name"),
"error_present": shape.get("error_present"),
}
def main() -> int:
print(f"main: {MAIN_DIR}")
print(f"pr: {PR_DIR}")
print()
if MAIN_DIR == PR_DIR:
print(
"WARN: MAIN_DIR == PR_DIR — diffs will be trivially identical.\n"
" Set up a sibling 'hermes-agent-main' checkout pinned to "
"origin/main to get real parity coverage."
)
print()
failures: list[str] = []
errors: list[str] = []
intentional_diffs: list[tuple[str, dict, dict]] = []
for label, config_yaml, env, plugin_register in SCENARIOS:
main_shape = _run_scenario(MAIN_DIR, label, config_yaml, env, plugin_register)
pr_shape = _run_scenario(PR_DIR, label, config_yaml, env, plugin_register)
if "error" in main_shape or "error" in pr_shape:
print(f" [ERR ] {label}: subprocess failed")
print(f" main: {main_shape}")
print(f" pr: {pr_shape}")
errors.append(label)
continue
main_reduced = _reduce(main_shape)
pr_reduced = _reduce(pr_shape)
if main_reduced == pr_reduced:
print(f" [OK] {label}: {main_reduced}")
continue
# On main, "plugin-installed" scenario returns fallback_edge
# (no plugin hook); on PR, it routes to the plugin. That's the
# only acceptable diff.
fallback_to_plugin = (
main_reduced.get("dispatch_kind") == "fallback_edge"
and pr_reduced.get("dispatch_kind") == "plugin"
and label == "plugin-installed"
)
if fallback_to_plugin:
print(f" [DIFF] {label}: fallback_edge → plugin — expected")
intentional_diffs.append((label, main_reduced, pr_reduced))
else:
print(f" [FAIL] {label}")
print(f" main: {main_reduced}")
print(f" pr: {pr_reduced}")
failures.append(label)
print()
if errors:
print(f"SUBPROCESS ERRORS in {len(errors)} scenario(s):")
for e in errors:
print(f" - {e}")
if failures:
print(f"BEHAVIOUR REGRESSION in {len(failures)} scenario(s):")
for f in failures:
print(f" - {f}")
if intentional_diffs:
print(
f"INTENTIONAL DIFFS ({len(intentional_diffs)}): "
f"fallback_edge → plugin dispatch when a plugin is registered."
)
if failures or errors:
return 1
print(f"PARITY OK across {len(SCENARIOS)} scenarios.")
return 0
if __name__ == "__main__":
sys.exit(main())

View file

@ -0,0 +1,323 @@
"""Tests for TTS plugin dispatch in tools/tts_tool.py (issue #30398).
Covers the three core invariants of the plugin dispatcher:
1. Built-in provider names short-circuit plugins NEVER win over a
built-in. Even if a plugin somehow ended up in the registry with a
built-in name (which the registry already blocks), the dispatcher
re-checks defensively.
2. Command-type providers declared under ``tts.providers.<name>: type:
command`` (PR #17843) win over a plugin with the same name. Config
is more local than plugin install.
3. Plugin dispatch fires only when the configured provider is neither
a built-in nor a command-type entry, AND a plugin is registered
under that name. Unknown names fall through.
Also exercises:
- Plugin exceptions surface to the outer error envelope (don't crash)
- Plugin returning a different path is honored
- voice_compatible: True triggers ffmpeg opus conversion path
- voice_compatible: False keeps the file as-is
The dispatcher is exercised in isolation we don't actually call
``text_to_speech_tool`` because that would require real audio file
writes. Each test directly calls
``tools.tts_tool._dispatch_to_plugin_provider`` / the predicate
helpers.
"""
from __future__ import annotations
from typing import Optional
import pytest
from agent import tts_registry
from agent.tts_provider import TTSProvider
from tools import tts_tool
class _FakeTTSProvider(TTSProvider):
def __init__(
self,
name: str,
voice_compat: bool = False,
raise_exc: Optional[BaseException] = None,
return_path: Optional[str] = None,
):
self._name = name
self._voice_compat = voice_compat
self._raise_exc = raise_exc
self._return_path = return_path
# Recorded for assertions
self.last_call: Optional[dict] = None
@property
def name(self) -> str:
return self._name
@property
def voice_compatible(self) -> bool:
return self._voice_compat
def synthesize(self, text, output_path, **kw):
self.last_call = {
"text": text,
"output_path": output_path,
"kwargs": dict(kw),
}
if self._raise_exc is not None:
raise self._raise_exc
return self._return_path if self._return_path is not None else output_path
@pytest.fixture(autouse=True)
def _reset_registry():
tts_registry._reset_for_tests()
yield
tts_registry._reset_for_tests()
# ---------------------------------------------------------------------------
# Resolution invariants
# ---------------------------------------------------------------------------
class TestBuiltinAlwaysWins:
"""Built-in TTS provider names short-circuit the dispatcher.
Even with a plugin registered (which the registry would reject
but the dispatcher is defensive), built-in names return None so
the caller's elif chain handles them natively.
"""
@pytest.mark.parametrize(
"builtin",
["edge", "openai", "elevenlabs", "minimax", "gemini",
"mistral", "xai", "piper", "kittentts", "neutts"],
)
def test_dispatcher_short_circuits_builtin(self, builtin):
result = tts_tool._dispatch_to_plugin_provider(
text="hello",
output_path="/tmp/out.mp3",
provider=builtin,
tts_config={},
)
assert result is None, (
f"Built-in {builtin!r} must short-circuit plugin dispatch. "
"If this test fails, the dispatcher would silently let a "
"plugin with a built-in name shadow the native handler — "
"violating the precedence rule from PR #17843."
)
def test_dispatcher_short_circuits_builtin_case_insensitive(self):
for variant in ("EDGE", "Edge", " edge ", "eDgE"):
assert (
tts_tool._dispatch_to_plugin_provider(
text="hello", output_path="/tmp/x.mp3",
provider=variant, tts_config={},
) is None
)
class TestCommandProviderWins:
"""A same-name ``tts.providers.<name>: type: command`` config beats a plugin.
Locality: a user's command-provider config is more specific than
whichever plugin happens to be installed.
"""
def test_command_config_beats_plugin(self):
tts_registry.register_provider(_FakeTTSProvider(name="my-tts"))
result = tts_tool._dispatch_to_plugin_provider(
text="hello",
output_path="/tmp/out.mp3",
provider="my-tts",
tts_config={
"providers": {
"my-tts": {
"type": "command",
"command": "echo 'hi' > {output_path}",
},
},
},
)
# Plugin path returns None → caller falls back to command
# provider dispatch (handled by the outer text_to_speech_tool
# via _resolve_command_provider_config).
assert result is None
class TestPluginDispatch:
"""Happy path: configured name matches a registered plugin, dispatcher fires."""
def test_registered_plugin_called(self):
provider = _FakeTTSProvider(name="cartesia")
tts_registry.register_provider(provider)
result = tts_tool._dispatch_to_plugin_provider(
text="hello world",
output_path="/tmp/out.mp3",
provider="cartesia",
tts_config={},
)
assert result == "/tmp/out.mp3"
assert provider.last_call is not None
assert provider.last_call["text"] == "hello world"
assert provider.last_call["output_path"] == "/tmp/out.mp3"
def test_unregistered_name_returns_none(self):
result = tts_tool._dispatch_to_plugin_provider(
text="hello",
output_path="/tmp/out.mp3",
provider="unknown-tts",
tts_config={},
)
assert result is None
def test_voice_model_speed_format_forwarded(self):
provider = _FakeTTSProvider(name="cartesia")
tts_registry.register_provider(provider)
result = tts_tool._dispatch_to_plugin_provider(
text="hello",
output_path="/tmp/out.opus",
provider="cartesia",
tts_config={
"voice": "voice-aria",
"model": "sonic-2",
"speed": 1.2,
"output_format": "opus",
},
)
assert result == "/tmp/out.opus"
kwargs = provider.last_call["kwargs"]
assert kwargs["voice"] == "voice-aria"
assert kwargs["model"] == "sonic-2"
assert kwargs["speed"] == 1.2
assert kwargs["format"] == "opus"
def test_empty_string_voice_passed_as_none(self):
"""Empty-string config values are normalized to None so providers can
fall back to their own defaults (matches the ABC contract)."""
provider = _FakeTTSProvider(name="cartesia")
tts_registry.register_provider(provider)
tts_tool._dispatch_to_plugin_provider(
text="hello",
output_path="/tmp/out.mp3",
provider="cartesia",
tts_config={"voice": "", "model": ""},
)
kwargs = provider.last_call["kwargs"]
assert kwargs["voice"] is None
assert kwargs["model"] is None
def test_provider_returning_different_path_honored(self):
"""If a provider rewrites the output path (e.g. format-driven extension
change), the dispatcher returns the new path."""
provider = _FakeTTSProvider(name="cartesia", return_path="/tmp/rewritten.opus")
tts_registry.register_provider(provider)
result = tts_tool._dispatch_to_plugin_provider(
text="hi",
output_path="/tmp/out.mp3",
provider="cartesia",
tts_config={},
)
assert result == "/tmp/rewritten.opus"
def test_provider_returning_none_falls_back_to_output_path(self):
"""Defensive: a provider returning None means the dispatcher should
report the caller-supplied output_path (matches the ABC contract the
provider is supposed to write to output_path)."""
provider = _FakeTTSProvider(name="cartesia", return_path=None)
# Override the default-output-path behavior to return None explicitly
provider._return_path = None
class _ReturnsNone(_FakeTTSProvider):
def synthesize(self, text, output_path, **kw):
return None # type: ignore[return-value]
provider2 = _ReturnsNone(name="weird")
tts_registry.register_provider(provider2)
result = tts_tool._dispatch_to_plugin_provider(
text="hi",
output_path="/tmp/out.mp3",
provider="weird",
tts_config={},
)
assert result == "/tmp/out.mp3"
def test_provider_exception_bubbles_up(self):
"""Plugin exceptions are NOT swallowed by the dispatcher — they bubble
up so the outer ``text_to_speech_tool`` try/except converts them to
the standard error envelope. Matches command-provider failure
behavior."""
provider = _FakeTTSProvider(
name="cartesia",
raise_exc=RuntimeError("network down"),
)
tts_registry.register_provider(provider)
with pytest.raises(RuntimeError, match="network down"):
tts_tool._dispatch_to_plugin_provider(
text="hi",
output_path="/tmp/out.mp3",
provider="cartesia",
tts_config={},
)
# ---------------------------------------------------------------------------
# voice_compatible flag
# ---------------------------------------------------------------------------
class TestVoiceCompatibleHelper:
def test_voice_compatible_true(self):
tts_registry.register_provider(
_FakeTTSProvider(name="cartesia", voice_compat=True)
)
assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is True
def test_voice_compatible_false_by_default(self):
tts_registry.register_provider(_FakeTTSProvider(name="cartesia"))
assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is False
def test_unregistered_provider_returns_false(self):
assert tts_tool._plugin_provider_is_voice_compatible("unknown") is False
def test_empty_provider_name_returns_false(self):
assert tts_tool._plugin_provider_is_voice_compatible("") is False
@pytest.mark.parametrize(
"builtin",
["edge", "openai", "elevenlabs", "minimax", "gemini",
"mistral", "xai", "piper", "kittentts", "neutts"],
)
def test_builtin_names_return_false(self, builtin):
"""voice_compatible helper short-circuits built-ins so they go
through the legacy code path that handles their format quirks."""
assert tts_tool._plugin_provider_is_voice_compatible(builtin) is False
def test_voice_compatible_case_insensitive(self):
tts_registry.register_provider(
_FakeTTSProvider(name="cartesia", voice_compat=True)
)
assert tts_tool._plugin_provider_is_voice_compatible("CARTESIA") is True
assert tts_tool._plugin_provider_is_voice_compatible(" cartesia ") is True
def test_provider_property_exception_returns_false(self):
"""A buggy ``voice_compatible`` property raising must not crash the
TTS pipeline."""
class _ExplodingProvider(_FakeTTSProvider):
@property
def voice_compatible(self) -> bool:
raise RuntimeError("boom")
tts_registry.register_provider(_ExplodingProvider(name="cartesia"))
assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is False

View file

@ -419,6 +419,123 @@ def _resolve_command_provider_config(
return None
def _dispatch_to_plugin_provider(
text: str,
output_path: str,
provider: str,
tts_config: Dict[str, Any],
) -> Optional[str]:
"""Route the call to a plugin-registered TTS provider, or return None.
Returns the path to the written audio file on dispatch, or ``None``
to fall through to the next resolution layer (built-in dispatch or
Edge TTS default).
Resolution invariants enforced here (matches issue #30398):
1. Built-in provider names short-circuit never reach the plugin
registry. The caller is responsible for the elif chain that
handles ``edge``/``openai``/etc.; this function explicitly
rejects those names defensively.
2. Command-type providers declared under
``tts.providers.<name>: type: command`` (PR #17843) win over a
plugin with the same name. The caller passes us only when its
own command-provider check returned None we re-verify here so
a refactor of the caller can't silently break the invariant.
3. Plugin dispatch fires only when ``provider`` matches a registered
:class:`TTSProvider` whose ``name`` equals the configured value.
Unknown names return None (caller falls through to Edge default).
Plugin exceptions are caught and re-raised the outer
``text_to_speech_tool`` try/except converts them to the standard
error envelope, matching how command-provider failures surface.
"""
if not provider:
return None
key = provider.lower().strip()
if key in BUILTIN_TTS_PROVIDERS:
return None
# Defense in depth: command-provider check should already have
# short-circuited the caller. If a same-name command config exists,
# bail so the command path wins.
if _is_command_provider_config(_get_named_provider_config(tts_config, key)):
return None
try:
from agent.tts_registry import get_provider
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
plugin_provider = get_provider(key)
if plugin_provider is None:
# Long-lived sessions may have discovered plugins before the
# bundled backend was patched in or before config changed.
# Retry once with a forced refresh before surfacing fall-
# through. Mirrors the image_gen / browser dispatcher
# recovery pattern.
_ensure_plugins_discovered(force=True)
plugin_provider = get_provider(key)
except Exception as exc: # noqa: BLE001 — discovery failure is non-fatal
logger.debug("tts plugin dispatch skipped (discovery failed): %s", exc)
return None
if plugin_provider is None:
return None
# Resolve voice / model / format from tts_config — providers should
# treat all of these as optional and fall back to their own defaults
# when None is passed (matches the ABC contract documented on
# ``TTSProvider.synthesize``).
voice = tts_config.get("voice") if isinstance(tts_config, dict) else None
model = tts_config.get("model") if isinstance(tts_config, dict) else None
speed = tts_config.get("speed") if isinstance(tts_config, dict) else None
fmt = (
tts_config.get("output_format", DEFAULT_COMMAND_TTS_OUTPUT_FORMAT)
if isinstance(tts_config, dict)
else DEFAULT_COMMAND_TTS_OUTPUT_FORMAT
)
logger.info(
"Generating speech with plugin TTS provider '%s'...", key,
)
written = plugin_provider.synthesize(
text,
output_path,
voice=voice if isinstance(voice, str) and voice else None,
model=model if isinstance(model, str) and model else None,
speed=float(speed) if isinstance(speed, (int, float)) else None,
format=str(fmt).lower() if fmt else "mp3",
)
# Provider contract: returns the (possibly rewritten) output path.
# Defensive against a provider returning None or a non-string —
# fall back to the caller's expected output_path.
return written if isinstance(written, str) and written else output_path
def _plugin_provider_is_voice_compatible(provider: str) -> bool:
"""Return True when the registered plugin provider opts into voice
bubble delivery via its ``voice_compatible`` property.
Defensive: any registry or property access failure means False
(matches the safe default for the command-provider path).
"""
if not provider:
return False
key = provider.lower().strip()
if key in BUILTIN_TTS_PROVIDERS:
return False
try:
from agent.tts_registry import get_provider
plugin_provider = get_provider(key)
if plugin_provider is None:
return False
return bool(plugin_provider.voice_compatible)
except Exception as exc: # noqa: BLE001
logger.debug(
"tts plugin voice_compatible check failed for '%s': %s", key, exc,
)
return False
def _iter_command_providers(tts_config: Dict[str, Any]):
"""Yield (name, config) pairs for every declared command-type provider."""
if not isinstance(tts_config, dict):
@ -1787,6 +1904,21 @@ def text_to_speech_tool(
text, file_str, provider, command_provider_config, tts_config,
)
# Plugin-registered TTS backend (issue #30398). Fires when the
# configured provider is neither a built-in nor a command-type
# entry, AND a plugin is registered under that name. The walrus
# binds `_plugin_path` only when the dispatcher returns a path
# (i.e. a plugin was actually found); a None return falls
# through to the built-in elif chain so unknown names hit the
# Edge TTS default at the bottom. The dispatcher itself enforces
# built-ins-always-win + command-wins-over-plugin defensively.
elif provider not in BUILTIN_TTS_PROVIDERS and (
_plugin_path := _dispatch_to_plugin_provider(
text, file_str, provider, tts_config,
)
) is not None:
file_str = _plugin_path
elif provider == "elevenlabs":
try:
_import_elevenlabs()
@ -1925,6 +2057,18 @@ def text_to_speech_tool(
if opus_path:
file_str = opus_path
voice_compatible = file_str.endswith(".ogg")
elif provider not in BUILTIN_TTS_PROVIDERS:
# Plugin-registered provider (issue #30398). Voice-bubble
# delivery opts in via ``TTSProvider.voice_compatible``
# (mirrors the command-provider opt-in). Plugins that
# already write Opus skip the ffmpeg conversion.
plugin_voice_compatible = _plugin_provider_is_voice_compatible(provider)
if plugin_voice_compatible:
if not file_str.endswith(".ogg"):
opus_path = _convert_to_opus(file_str)
if opus_path:
file_str = opus_path
voice_compatible = file_str.endswith(".ogg")
elif (
want_opus
and provider in {"edge", "neutts", "minimax", "xai", "kittentts", "piper"}

View file

@ -234,7 +234,7 @@ The table above shows the four plugin categories, but within "General plugins" t
| A **context-compression strategy** | Context-engine plugin — `ctx.register_context_engine()` | [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) |
| An **image-generation backend** (DALL·E, SDXL, …) | Backend plugin — `ctx.register_image_gen_provider()` | [Image Generation Provider Plugins](/docs/developer-guide/image-gen-provider-plugin) |
| A **video-generation backend** (Veo, Kling, Pixverse, Grok-Imagine, Runway, …) | Backend plugin — `ctx.register_video_gen_provider()` | [Video Generation Provider Plugins](/docs/developer-guide/video-gen-provider-plugin) |
| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven — declare under `tts.providers.<name>` with `type: command` in `config.yaml` | [TTS setup](/docs/user-guide/features/tts#custom-command-providers) |
| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven (recommended) — declare under `tts.providers.<name>` with `type: command` in `config.yaml`. OR Python backend plugin — `ctx.register_tts_provider()` for Python-SDK / streaming engines that need more than a shell template. | [TTS Setup](/docs/user-guide/features/tts#custom-command-providers) · [Python plugin guide](/docs/user-guide/features/tts#python-plugin-providers) |
| An **STT backend** (custom whisper binary, local ASR CLI) | Config-driven — set `HERMES_LOCAL_STT_COMMAND` env var to a shell template | [Voice Message Transcription (STT)](/docs/user-guide/features/tts#voice-message-transcription-stt) |
| **External tools via MCP** (filesystem, GitHub, Linear, Notion, any MCP server) | Config-driven — declare `mcp_servers.<name>` with `command:` / `url:` in `config.yaml`. Hermes auto-discovers the server's tools and registers them alongside built-ins. | [MCP](/docs/user-guide/features/mcp) |
| **Additional skill sources** (custom GitHub repos, private skill indexes) | CLI — `hermes skills tap add <repo>` | [Skills Hub](/docs/user-guide/features/skills#skills-hub) · [Publishing a custom tap](/docs/user-guide/features/skills#publishing-a-custom-skill-tap) |

View file

@ -297,6 +297,85 @@ Use `{{` and `}}` for literal braces.
Command-type providers run whatever shell command you configure, with your user's permissions. Hermes quotes placeholder values and enforces the configured timeout, but the command template itself is trusted local input — treat it the same way you would a shell script on your PATH.
### Python plugin providers
For TTS engines that can't be expressed as a single shell command — Python SDKs without a CLI, streaming engines, voice-listing APIs, OAuth-refreshing auth — register a Python plugin via `ctx.register_tts_provider()`. The plugin **coexists with** (does not replace) the [Custom command providers](#custom-command-providers) registry; pick the surface that fits your engine.
#### When to pick which
| Your backend has… | Use |
|---|---|
| A single CLI reading text from a file/stdin and writing audio to a file/stdout | **Command provider** (no Python needed) |
| Two or three CLIs chained with shell pipes | **Command provider** |
| A Python SDK only — no CLI | **Plugin** |
| Streaming bytes you want to deliver chunked (mid-generation voice bubbles) | **Plugin** (override `stream()`) |
| A voice-listing API used by `hermes setup` | **Plugin** (override `list_voices()`) |
| OAuth refresh flow (not a static bearer token) | **Plugin** |
Built-ins always win, and command providers win over a same-name plugin — so plugins are safe to register against any non-built-in name without worrying about shadowing your existing config.
#### Minimal plugin
Drop this in `~/.hermes/plugins/my-tts/`:
`plugin.yaml`:
```yaml
name: my-tts
version: 0.1.0
description: "My custom Python TTS backend"
```
`__init__.py`:
```python
from agent.tts_provider import TTSProvider
class MyTTSProvider(TTSProvider):
@property
def name(self) -> str:
return "my-tts" # what tts.provider matches against
@property
def display_name(self) -> str:
return "My Custom TTS"
def is_available(self) -> bool:
# Return False when credentials/deps are missing — picker skips
# this row but the dispatcher still routes here on explicit config.
import os
return bool(os.environ.get("MY_TTS_API_KEY"))
def synthesize(self, text, output_path, *, voice=None, model=None,
speed=None, format="mp3", **extra) -> str:
# Write audio bytes to output_path, return the path.
# Raise on failure — the dispatcher converts exceptions to a
# standard error envelope.
import my_tts_sdk
client = my_tts_sdk.Client()
audio_bytes = client.synthesize(text=text, voice=voice or "default")
with open(output_path, "wb") as f:
f.write(audio_bytes)
return output_path
def register(ctx):
ctx.register_tts_provider(MyTTSProvider())
```
Enable it (`hermes plugins enable my-tts`), point `tts.provider` at it (`tts.provider: my-tts` in `config.yaml`), and the `text_to_speech` tool will route through your plugin.
#### Optional hooks
Override these on your provider class for richer integration:
- `list_voices()` → list of `{id, display, language, gender, preview_url}` dicts shown in `hermes tools`.
- `list_models()` → list of `{id, display, languages, max_text_length}` dicts.
- `get_setup_schema()` → return `{name, badge, tag, env_vars: [{key, prompt, url}]}` to power the picker row in `hermes tools` / `hermes setup`. Without this, the plugin still works but its row in the picker is minimal.
- `stream(text, *, voice, model, format, **extra)` → iterator yielding audio bytes for streaming delivery (default raises `NotImplementedError`).
- `voice_compatible` property → set `True` if your output is Opus-compatible and the gateway should deliver it as a voice bubble (default `False` = regular audio attachment).
See `agent/tts_provider.py` for the full ABC including docstrings.
## Voice Message Transcription (STT)
Voice messages sent on Telegram, Discord, WhatsApp, Slack, or Signal are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.