mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
feat(tts): add register_tts_provider() plugin hook (closes #30398)
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398
This commit is contained in:
parent
782681f904
commit
00ec0b617c
13 changed files with 2037 additions and 1 deletions
274
agent/tts_provider.py
Normal file
274
agent/tts_provider.py
Normal file
|
|
@ -0,0 +1,274 @@
|
|||
"""
|
||||
Text-to-Speech Provider ABC
|
||||
============================
|
||||
|
||||
Defines the pluggable-backend interface for text-to-speech synthesis.
|
||||
Providers register instances via
|
||||
``PluginContext.register_tts_provider()``; the active one (selected via
|
||||
``tts.provider`` in ``config.yaml``) services every ``text_to_speech``
|
||||
tool call **only when the configured name is neither a built-in nor a
|
||||
command-type provider declared under ``tts.providers.<name>``**.
|
||||
|
||||
Three coexisting TTS extension surfaces — in resolution order:
|
||||
|
||||
1. **Built-in providers** (``BUILTIN_TTS_PROVIDERS`` in
|
||||
:mod:`tools.tts_tool`) — native Python implementations (edge, openai,
|
||||
elevenlabs, …). **Always win** — plugins cannot shadow them.
|
||||
2. **Command-type providers** declared under ``tts.providers.<name>:
|
||||
type: command`` (PR #17843, commit ``2facea7f7``). Wire any local
|
||||
CLI into Hermes with shell-template placeholders. **Wins over a
|
||||
same-name plugin** — config is more local than plugin install.
|
||||
3. **Plugin-registered providers** (this ABC). For backends that need a
|
||||
Python SDK, streaming bytes, OAuth refresh, or voice-listing APIs
|
||||
the shell-template grammar can't reasonably express.
|
||||
|
||||
Built-ins-always-win is enforced at registration time
|
||||
(:func:`agent.tts_registry.register_provider` rejects names in
|
||||
``BUILTIN_TTS_PROVIDERS`` with a warning) AND at dispatch time
|
||||
(:func:`tools.tts_tool._dispatch_to_plugin_provider` re-checks
|
||||
defensively). The dispatcher also rejects plugin dispatch when a same-
|
||||
name command provider is configured.
|
||||
|
||||
Providers live in ``<repo>/plugins/tts/<name>/`` (built-in plugins, no
|
||||
shipped today) or ``~/.hermes/plugins/tts/<name>/`` (user-installed).
|
||||
None ship in-tree as of issue #30398 — the hook is additive
|
||||
infrastructure waiting for a real consumer (Cartesia, Fish Audio, …).
|
||||
|
||||
Response contract
|
||||
-----------------
|
||||
:meth:`TTSProvider.synthesize` writes the audio bytes to ``output_path``
|
||||
and returns the path as a string. Implementations should raise on
|
||||
failure — the dispatcher converts exceptions into the standard
|
||||
``{success: False, error: …}`` JSON envelope the rest of Hermes
|
||||
expects.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import abc
|
||||
import logging
|
||||
from typing import Any, Dict, Iterator, List, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
DEFAULT_OUTPUT_FORMAT = "mp3"
|
||||
VALID_OUTPUT_FORMATS = frozenset({"mp3", "wav", "ogg", "opus", "flac"})
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# ABC
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TTSProvider(abc.ABC):
|
||||
"""Abstract base class for a text-to-speech backend.
|
||||
|
||||
Subclasses must implement :attr:`name` and :meth:`synthesize`.
|
||||
Everything else has sane defaults — override only what your provider
|
||||
needs.
|
||||
"""
|
||||
|
||||
@property
|
||||
@abc.abstractmethod
|
||||
def name(self) -> str:
|
||||
"""Stable short identifier used in ``tts.provider`` config.
|
||||
|
||||
Lowercase, no spaces. Examples: ``cartesia``, ``fishaudio``,
|
||||
``deepgram``. Names that collide with a built-in TTS provider
|
||||
(``edge``, ``openai``, ``elevenlabs``, ``minimax``, ``gemini``,
|
||||
``mistral``, ``xai``, ``piper``, ``kittentts``, ``neutts``) are
|
||||
rejected at registration time.
|
||||
"""
|
||||
|
||||
@property
|
||||
def display_name(self) -> str:
|
||||
"""Human-readable label shown in ``hermes tools``.
|
||||
|
||||
Defaults to ``name.title()`` (e.g. ``Cartesia`` for ``cartesia``).
|
||||
"""
|
||||
return self.name.title()
|
||||
|
||||
def is_available(self) -> bool:
|
||||
"""Return True when this provider can service calls.
|
||||
|
||||
Typically checks for a required API key + that the SDK is
|
||||
importable. Default: True (providers with no external
|
||||
dependencies are always available).
|
||||
|
||||
Must NOT raise — used by the picker and ``hermes setup`` for
|
||||
availability displays and should fail gracefully.
|
||||
"""
|
||||
return True
|
||||
|
||||
def list_voices(self) -> List[Dict[str, Any]]:
|
||||
"""Return voice catalog entries.
|
||||
|
||||
Each entry::
|
||||
|
||||
{
|
||||
"id": "voice-abc-123", # required
|
||||
"display": "Aria — neutral female", # optional; defaults to id
|
||||
"language": "en-US", # optional
|
||||
"gender": "female", # optional
|
||||
"preview_url": "https://...mp3", # optional
|
||||
}
|
||||
|
||||
Default: empty list (provider has no enumerable voices or
|
||||
doesn't surface them via API).
|
||||
"""
|
||||
return []
|
||||
|
||||
def list_models(self) -> List[Dict[str, Any]]:
|
||||
"""Return model catalog entries.
|
||||
|
||||
Each entry::
|
||||
|
||||
{
|
||||
"id": "sonic-2", # required
|
||||
"display": "Sonic 2", # optional
|
||||
"languages": ["en", "es", "fr"], # optional
|
||||
"max_text_length": 5000, # optional
|
||||
}
|
||||
|
||||
Default: empty list (provider has a single fixed model or
|
||||
doesn't expose model selection).
|
||||
"""
|
||||
return []
|
||||
|
||||
def get_setup_schema(self) -> Dict[str, Any]:
|
||||
"""Return provider metadata for the ``hermes tools`` picker.
|
||||
|
||||
Used by ``tools_config.py`` to inject this provider as a row in
|
||||
the Text-to-Speech provider list. Shape::
|
||||
|
||||
{
|
||||
"name": "Cartesia", # picker label
|
||||
"badge": "paid", # optional short tag
|
||||
"tag": "Ultra-low-latency streaming", # optional subtitle
|
||||
"env_vars": [ # keys to prompt for
|
||||
{"key": "CARTESIA_API_KEY",
|
||||
"prompt": "Cartesia API key",
|
||||
"url": "https://play.cartesia.ai/console"},
|
||||
],
|
||||
}
|
||||
|
||||
Default: minimal entry derived from ``display_name`` with no
|
||||
env vars. Override to expose API key prompts and custom badges.
|
||||
"""
|
||||
return {
|
||||
"name": self.display_name,
|
||||
"badge": "",
|
||||
"tag": "",
|
||||
"env_vars": [],
|
||||
}
|
||||
|
||||
def default_model(self) -> Optional[str]:
|
||||
"""Return the default model id, or None if not applicable."""
|
||||
models = self.list_models()
|
||||
if models:
|
||||
return models[0].get("id")
|
||||
return None
|
||||
|
||||
def default_voice(self) -> Optional[str]:
|
||||
"""Return the default voice id, or None if not applicable."""
|
||||
voices = self.list_voices()
|
||||
if voices:
|
||||
return voices[0].get("id")
|
||||
return None
|
||||
|
||||
@abc.abstractmethod
|
||||
def synthesize(
|
||||
self,
|
||||
text: str,
|
||||
output_path: str,
|
||||
*,
|
||||
voice: Optional[str] = None,
|
||||
model: Optional[str] = None,
|
||||
speed: Optional[float] = None,
|
||||
format: str = DEFAULT_OUTPUT_FORMAT,
|
||||
**extra: Any,
|
||||
) -> str:
|
||||
"""Synthesize ``text`` and write audio bytes to ``output_path``.
|
||||
|
||||
Returns the absolute path to the written file as a string
|
||||
(typically just echoes ``output_path``). Raises on failure —
|
||||
the dispatcher converts exceptions to the standard
|
||||
``{success: False, error: ...}`` JSON envelope.
|
||||
|
||||
Args:
|
||||
text: The text to synthesize. Already truncated to the
|
||||
provider's max length by the dispatcher.
|
||||
output_path: Absolute path where the audio file should be
|
||||
written. Parent directory is guaranteed to exist.
|
||||
voice: Voice identifier from :meth:`list_voices`, or None
|
||||
to use :meth:`default_voice`.
|
||||
model: Model identifier from :meth:`list_models`, or None
|
||||
to use :meth:`default_model`.
|
||||
speed: Optional speech-rate multiplier (1.0 = normal).
|
||||
Providers that don't support speed control should
|
||||
ignore this argument.
|
||||
format: Output audio format. Implementations should match
|
||||
the requested format when possible; if unsupported,
|
||||
pick the closest equivalent and ensure ``output_path``
|
||||
ends with the correct extension.
|
||||
**extra: Forward-compat parameters future schema versions
|
||||
may expose. Implementations should ignore unknown keys.
|
||||
"""
|
||||
|
||||
def stream(
|
||||
self,
|
||||
text: str,
|
||||
*,
|
||||
voice: Optional[str] = None,
|
||||
model: Optional[str] = None,
|
||||
format: str = "opus",
|
||||
**extra: Any,
|
||||
) -> Iterator[bytes]:
|
||||
"""Stream synthesized audio bytes.
|
||||
|
||||
Optional. Providers that don't support streaming raise
|
||||
:class:`NotImplementedError` (the default) and the dispatcher
|
||||
falls back to :meth:`synthesize` + read-whole-file.
|
||||
|
||||
Args mirror :meth:`synthesize`. Default ``format`` is ``opus``
|
||||
because the primary streaming use case is voice-bubble
|
||||
delivery (Telegram et al.) which requires Opus.
|
||||
"""
|
||||
raise NotImplementedError(
|
||||
f"TTS provider {self.name!r} does not implement streaming "
|
||||
"synthesis. Use synthesize() instead, or implement stream() "
|
||||
"if your backend supports it."
|
||||
)
|
||||
|
||||
@property
|
||||
def voice_compatible(self) -> bool:
|
||||
"""Whether output is suitable for voice-bubble delivery.
|
||||
|
||||
Mirrors the ``tts.providers.<name>.voice_compatible`` field
|
||||
from PR #17843. When True, the gateway's voice-message
|
||||
delivery pipeline runs ffmpeg conversion to Opus if needed.
|
||||
When False, output is delivered as a regular audio attachment.
|
||||
|
||||
Default: False (safe — providers opt in explicitly).
|
||||
"""
|
||||
return False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def resolve_output_format(value: Optional[str]) -> str:
|
||||
"""Clamp an output_format value to the valid set.
|
||||
|
||||
Invalid values are coerced to :data:`DEFAULT_OUTPUT_FORMAT` rather
|
||||
than rejected so the tool surface is forgiving of agent mistakes.
|
||||
"""
|
||||
if not isinstance(value, str):
|
||||
return DEFAULT_OUTPUT_FORMAT
|
||||
v = value.strip().lower()
|
||||
if v in VALID_OUTPUT_FORMATS:
|
||||
return v
|
||||
return DEFAULT_OUTPUT_FORMAT
|
||||
133
agent/tts_registry.py
Normal file
133
agent/tts_registry.py
Normal file
|
|
@ -0,0 +1,133 @@
|
|||
"""
|
||||
TTS Provider Registry
|
||||
=====================
|
||||
|
||||
Central map of registered TTS providers. Populated by plugins at
|
||||
import-time via :meth:`PluginContext.register_tts_provider`; consumed
|
||||
by :mod:`tools.tts_tool` to dispatch ``text_to_speech`` tool calls to
|
||||
the active plugin backend **when** the configured ``tts.provider``
|
||||
name is neither a built-in nor a command-type provider.
|
||||
|
||||
Built-ins-always-win
|
||||
--------------------
|
||||
Plugin names that collide with a built-in TTS provider (``edge``,
|
||||
``openai``, ``elevenlabs``, ``minimax``, ``gemini``, ``mistral``,
|
||||
``xai``, ``piper``, ``kittentts``, ``neutts``) are rejected at
|
||||
registration with a warning. This invariant is also re-checked at
|
||||
dispatch time in :func:`tools.tts_tool._dispatch_to_plugin_provider`.
|
||||
|
||||
Command-providers-win-over-plugins
|
||||
----------------------------------
|
||||
This registry doesn't enforce the command-vs-plugin precedence — that
|
||||
lives in the dispatcher, which checks for a same-name
|
||||
``tts.providers.<name>: type: command`` entry before consulting the
|
||||
registry. The rationale is locality: a name declared in the user's
|
||||
``config.yaml`` is more specific to their setup than a plugin that
|
||||
happens to be installed.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import threading
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
from agent.tts_provider import TTSProvider
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Names reserved for native built-in TTS handlers. Plugins cannot
|
||||
# register a name in this set — the registration call is rejected with
|
||||
# a warning. **Kept in sync with ``BUILTIN_TTS_PROVIDERS`` in
|
||||
# :mod:`tools.tts_tool`** — a regression test in
|
||||
# ``tests/agent/test_tts_registry.py::TestBuiltinSync`` fails if the
|
||||
# two lists drift. Importing from ``tools.tts_tool`` directly would
|
||||
# create a circular dependency (``tools.tts_tool`` imports
|
||||
# ``agent.tts_registry`` for dispatch).
|
||||
_BUILTIN_NAMES = frozenset({
|
||||
"edge",
|
||||
"elevenlabs",
|
||||
"openai",
|
||||
"minimax",
|
||||
"xai",
|
||||
"mistral",
|
||||
"gemini",
|
||||
"neutts",
|
||||
"kittentts",
|
||||
"piper",
|
||||
})
|
||||
|
||||
|
||||
_providers: Dict[str, TTSProvider] = {}
|
||||
_lock = threading.Lock()
|
||||
|
||||
|
||||
def register_provider(provider: TTSProvider) -> None:
|
||||
"""Register a TTS provider.
|
||||
|
||||
Rejects:
|
||||
|
||||
- Non-:class:`TTSProvider` instances (raises :class:`TypeError`).
|
||||
- Empty/whitespace ``.name`` (raises :class:`ValueError`).
|
||||
- Names colliding with a built-in (logs a warning, silently
|
||||
ignores — built-ins-always-win invariant).
|
||||
|
||||
Re-registration (same ``name``) overwrites the previous entry and
|
||||
logs a debug message — makes hot-reload scenarios (tests, dev
|
||||
loops) behave predictably.
|
||||
"""
|
||||
if not isinstance(provider, TTSProvider):
|
||||
raise TypeError(
|
||||
f"register_provider() expects a TTSProvider instance, "
|
||||
f"got {type(provider).__name__}"
|
||||
)
|
||||
name = provider.name
|
||||
if not isinstance(name, str) or not name.strip():
|
||||
raise ValueError("TTS provider .name must be a non-empty string")
|
||||
key = name.strip().lower()
|
||||
if key in _BUILTIN_NAMES:
|
||||
logger.warning(
|
||||
"TTS provider '%s' shadows a built-in name; registration ignored. "
|
||||
"Built-in TTS providers (%s) always win — pick a different name.",
|
||||
key, ", ".join(sorted(_BUILTIN_NAMES)),
|
||||
)
|
||||
return
|
||||
with _lock:
|
||||
existing = _providers.get(key)
|
||||
_providers[key] = provider
|
||||
if existing is not None:
|
||||
logger.debug(
|
||||
"TTS provider '%s' re-registered (was %r)",
|
||||
key, type(existing).__name__,
|
||||
)
|
||||
else:
|
||||
logger.debug(
|
||||
"Registered TTS provider '%s' (%s)",
|
||||
key, type(provider).__name__,
|
||||
)
|
||||
|
||||
|
||||
def list_providers() -> List[TTSProvider]:
|
||||
"""Return all registered providers, sorted by name."""
|
||||
with _lock:
|
||||
items = list(_providers.values())
|
||||
return sorted(items, key=lambda p: p.name)
|
||||
|
||||
|
||||
def get_provider(name: str) -> Optional[TTSProvider]:
|
||||
"""Return the provider registered under *name*, or None.
|
||||
|
||||
Name matching is case-insensitive and whitespace-tolerant — mirrors
|
||||
how ``tools.tts_tool._get_provider`` normalizes the configured
|
||||
``tts.provider`` value.
|
||||
"""
|
||||
if not isinstance(name, str):
|
||||
return None
|
||||
return _providers.get(name.strip().lower())
|
||||
|
||||
|
||||
def _reset_for_tests() -> None:
|
||||
"""Clear the registry. **Test-only.**"""
|
||||
with _lock:
|
||||
_providers.clear()
|
||||
|
|
@ -640,6 +640,44 @@ class PluginContext:
|
|||
self.manifest.name, provider.name,
|
||||
)
|
||||
|
||||
# -- TTS provider registration -------------------------------------------
|
||||
|
||||
def register_tts_provider(self, provider) -> None:
|
||||
"""Register a text-to-speech backend.
|
||||
|
||||
``provider`` must be an instance of
|
||||
:class:`agent.tts_provider.TTSProvider`. The ``provider.name``
|
||||
attribute is what ``tts.provider`` in ``config.yaml`` matches
|
||||
against when routing ``text_to_speech`` tool calls — **but
|
||||
only when**:
|
||||
|
||||
1. ``provider.name`` is NOT a built-in TTS provider name
|
||||
(``edge``, ``openai``, ``elevenlabs``, …). Built-ins always
|
||||
win — the registry rejects shadowing names with a warning.
|
||||
2. There is NO ``tts.providers.<name>: type: command`` entry
|
||||
with the same name. Command-providers (PR #17843) win on
|
||||
name collision because config is more local than plugin
|
||||
install.
|
||||
|
||||
Coexists with the command-provider registry rather than
|
||||
replacing it — see issue #30398 for the full design rationale.
|
||||
"""
|
||||
from agent.tts_provider import TTSProvider
|
||||
from agent.tts_registry import register_provider as _register_tts_provider
|
||||
|
||||
if not isinstance(provider, TTSProvider):
|
||||
logger.warning(
|
||||
"Plugin '%s' tried to register a TTS provider that does "
|
||||
"not inherit from TTSProvider. Ignoring.",
|
||||
self.manifest.name,
|
||||
)
|
||||
return
|
||||
_register_tts_provider(provider)
|
||||
logger.info(
|
||||
"Plugin '%s' registered TTS provider: %s",
|
||||
self.manifest.name, provider.name,
|
||||
)
|
||||
|
||||
# -- platform adapter registration ---------------------------------------
|
||||
|
||||
def register_platform(
|
||||
|
|
|
|||
|
|
@ -1753,6 +1753,62 @@ def _plugin_browser_providers() -> list[dict]:
|
|||
return rows
|
||||
|
||||
|
||||
def _plugin_tts_providers() -> list[dict]:
|
||||
"""Build picker-row dicts from plugin-registered TTS providers.
|
||||
|
||||
Issue #30398 — the ``register_tts_provider()`` plugin hook
|
||||
coexists alongside the 10 built-in TTS providers
|
||||
(``edge``/``openai``/``elevenlabs``/…) and the
|
||||
``tts.providers.<name>: type: command`` registry from PR #17843.
|
||||
Built-in rows stay hardcoded in ``TOOL_CATEGORIES["tts"]``; this
|
||||
function only injects PLUGIN-registered providers.
|
||||
|
||||
Defensive: plugins whose name collides with a built-in TTS provider
|
||||
are filtered out — even though the registry already rejects them
|
||||
at registration time, a future code path that registers directly
|
||||
via :func:`agent.tts_registry.register_provider` could slip
|
||||
through. Filtering here keeps the picker invariant.
|
||||
"""
|
||||
try:
|
||||
from agent.tts_registry import _BUILTIN_NAMES, list_providers
|
||||
from hermes_cli.plugins import _ensure_plugins_discovered
|
||||
|
||||
_ensure_plugins_discovered()
|
||||
providers = list_providers()
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
rows: list[dict] = []
|
||||
for provider in providers:
|
||||
name = getattr(provider, "name", None)
|
||||
if not name:
|
||||
continue
|
||||
# Defensive: reject built-in shadowing at the picker layer too.
|
||||
if name.lower().strip() in _BUILTIN_NAMES:
|
||||
continue
|
||||
try:
|
||||
schema = provider.get_setup_schema()
|
||||
except Exception:
|
||||
continue
|
||||
if not isinstance(schema, dict):
|
||||
continue
|
||||
row = {
|
||||
"name": schema.get("name", provider.display_name),
|
||||
"badge": schema.get("badge", ""),
|
||||
"tag": schema.get("tag", ""),
|
||||
"env_vars": schema.get("env_vars", []),
|
||||
# Selecting this row writes ``tts.provider: <name>`` — the
|
||||
# same write-path used by hardcoded rows. The plugin
|
||||
# dispatcher picks it up automatically from there.
|
||||
"tts_provider": name,
|
||||
"tts_plugin_name": name,
|
||||
}
|
||||
if schema.get("post_setup"):
|
||||
row["post_setup"] = schema["post_setup"]
|
||||
rows.append(row)
|
||||
return rows
|
||||
|
||||
|
||||
def _visible_providers(cat: dict, config: dict) -> list[dict]:
|
||||
"""Return provider entries visible for the current auth/config state."""
|
||||
features = get_nous_subscription_features(config)
|
||||
|
|
@ -1790,6 +1846,12 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
|
|||
if cat.get("name") == "Browser Automation":
|
||||
visible.extend(_plugin_browser_providers())
|
||||
|
||||
# Inject plugin-registered TTS backends (issue #30398). Plugin rows
|
||||
# render BELOW the 10 hardcoded built-in rows. Built-in shadowing
|
||||
# is filtered out by ``_plugin_tts_providers`` defensively.
|
||||
if cat.get("name") == "Text-to-Speech":
|
||||
visible.extend(_plugin_tts_providers())
|
||||
|
||||
return visible
|
||||
|
||||
|
||||
|
|
|
|||
312
tests/agent/test_tts_registry.py
Normal file
312
tests/agent/test_tts_registry.py
Normal file
|
|
@ -0,0 +1,312 @@
|
|||
"""Tests for agent/tts_registry.py and agent/tts_provider.py.
|
||||
|
||||
Covers:
|
||||
- Registration happy path
|
||||
- Registration rejection: non-TTSProvider type
|
||||
- Registration rejection: empty/whitespace name
|
||||
- Built-in name shadowing: warning + silent ignore (no exception)
|
||||
- Re-registration: overwrites + logs at debug
|
||||
- Case + whitespace insensitivity on lookup
|
||||
- ABC contract: default implementations work
|
||||
- ABC contract: synthesize() must be implemented
|
||||
- ABC contract: stream() raises NotImplementedError by default
|
||||
- resolve_output_format helper coerces invalid input
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from typing import Any, Optional
|
||||
|
||||
import pytest
|
||||
|
||||
from agent import tts_registry
|
||||
from agent.tts_provider import (
|
||||
DEFAULT_OUTPUT_FORMAT,
|
||||
VALID_OUTPUT_FORMATS,
|
||||
TTSProvider,
|
||||
resolve_output_format,
|
||||
)
|
||||
|
||||
|
||||
class _FakeProvider(TTSProvider):
|
||||
def __init__(
|
||||
self,
|
||||
name: str = "fake",
|
||||
display: Optional[str] = None,
|
||||
voice_compat: bool = False,
|
||||
synthesize_impl: Optional[Any] = None,
|
||||
):
|
||||
self._name = name
|
||||
self._display = display
|
||||
self._voice_compat = voice_compat
|
||||
self._synthesize_impl = synthesize_impl
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return self._name
|
||||
|
||||
@property
|
||||
def display_name(self) -> str:
|
||||
return self._display if self._display is not None else super().display_name
|
||||
|
||||
@property
|
||||
def voice_compatible(self) -> bool:
|
||||
return self._voice_compat
|
||||
|
||||
def synthesize(self, text: str, output_path: str, **kw):
|
||||
if self._synthesize_impl is not None:
|
||||
return self._synthesize_impl(text, output_path, **kw)
|
||||
return output_path
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _reset_registry():
|
||||
tts_registry._reset_for_tests()
|
||||
yield
|
||||
tts_registry._reset_for_tests()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Registration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRegistration:
|
||||
def test_happy_path(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
tts_registry.register_provider(p)
|
||||
assert tts_registry.get_provider("cartesia") is p
|
||||
assert [r.name for r in tts_registry.list_providers()] == ["cartesia"]
|
||||
|
||||
def test_rejects_non_provider_type(self):
|
||||
with pytest.raises(TypeError, match="expects a TTSProvider instance"):
|
||||
tts_registry.register_provider("not a provider") # type: ignore[arg-type]
|
||||
assert tts_registry.list_providers() == []
|
||||
|
||||
def test_rejects_empty_name(self):
|
||||
p = _FakeProvider(name="")
|
||||
with pytest.raises(ValueError, match="non-empty string"):
|
||||
tts_registry.register_provider(p)
|
||||
assert tts_registry.list_providers() == []
|
||||
|
||||
def test_rejects_whitespace_name(self):
|
||||
p = _FakeProvider(name=" ")
|
||||
with pytest.raises(ValueError, match="non-empty string"):
|
||||
tts_registry.register_provider(p)
|
||||
assert tts_registry.list_providers() == []
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"builtin",
|
||||
["edge", "openai", "elevenlabs", "minimax", "gemini",
|
||||
"mistral", "xai", "piper", "kittentts", "neutts"],
|
||||
)
|
||||
def test_rejects_builtin_shadow_with_warning(self, builtin, caplog):
|
||||
"""Built-in names always win — plugin registration is silently ignored
|
||||
but a warning is logged so the operator can see what happened.
|
||||
"""
|
||||
p = _FakeProvider(name=builtin)
|
||||
with caplog.at_level(logging.WARNING, logger="agent.tts_registry"):
|
||||
tts_registry.register_provider(p)
|
||||
assert "shadows a built-in name" in caplog.text
|
||||
assert builtin in caplog.text
|
||||
assert tts_registry.get_provider(builtin) is None
|
||||
assert tts_registry.list_providers() == []
|
||||
|
||||
def test_builtin_shadow_case_insensitive(self, caplog):
|
||||
"""``EDGE``/``Edge``/`` edge `` all collide with the ``edge`` built-in."""
|
||||
for variant in ("EDGE", "Edge", " edge ", "eDgE"):
|
||||
tts_registry._reset_for_tests()
|
||||
with caplog.at_level(logging.WARNING, logger="agent.tts_registry"):
|
||||
tts_registry.register_provider(_FakeProvider(name=variant))
|
||||
assert tts_registry.list_providers() == [], (
|
||||
f"variant {variant!r} should have been rejected as a built-in shadow"
|
||||
)
|
||||
|
||||
def test_reregistration_overwrites(self, caplog):
|
||||
p1 = _FakeProvider(name="cartesia")
|
||||
p2 = _FakeProvider(name="cartesia")
|
||||
tts_registry.register_provider(p1)
|
||||
with caplog.at_level(logging.DEBUG, logger="agent.tts_registry"):
|
||||
tts_registry.register_provider(p2)
|
||||
assert tts_registry.get_provider("cartesia") is p2
|
||||
assert "re-registered" in caplog.text
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Lookup
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestLookup:
|
||||
def test_get_provider_missing_returns_none(self):
|
||||
assert tts_registry.get_provider("nonexistent") is None
|
||||
|
||||
def test_get_provider_non_string_returns_none(self):
|
||||
assert tts_registry.get_provider(None) is None # type: ignore[arg-type]
|
||||
assert tts_registry.get_provider(123) is None # type: ignore[arg-type]
|
||||
|
||||
def test_get_provider_case_insensitive(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
tts_registry.register_provider(p)
|
||||
assert tts_registry.get_provider("CARTESIA") is p
|
||||
assert tts_registry.get_provider("Cartesia") is p
|
||||
|
||||
def test_get_provider_whitespace_tolerant(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
tts_registry.register_provider(p)
|
||||
assert tts_registry.get_provider(" cartesia ") is p
|
||||
|
||||
def test_list_providers_sorted(self):
|
||||
tts_registry.register_provider(_FakeProvider(name="zylo"))
|
||||
tts_registry.register_provider(_FakeProvider(name="alpha"))
|
||||
tts_registry.register_provider(_FakeProvider(name="middle"))
|
||||
names = [p.name for p in tts_registry.list_providers()]
|
||||
assert names == ["alpha", "middle", "zylo"]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# ABC contract
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestABCContract:
|
||||
def test_must_implement_synthesize(self):
|
||||
class Incomplete(TTSProvider):
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return "incomplete"
|
||||
# synthesize NOT implemented
|
||||
|
||||
with pytest.raises(TypeError, match="abstract"):
|
||||
Incomplete() # type: ignore[abstract]
|
||||
|
||||
def test_must_implement_name(self):
|
||||
class Incomplete(TTSProvider):
|
||||
def synthesize(self, text, output_path, **kw):
|
||||
return output_path
|
||||
# name NOT implemented
|
||||
|
||||
with pytest.raises(TypeError, match="abstract"):
|
||||
Incomplete() # type: ignore[abstract]
|
||||
|
||||
def test_display_name_defaults_to_title(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
assert p.display_name == "Cartesia"
|
||||
|
||||
def test_display_name_override_respected(self):
|
||||
p = _FakeProvider(name="cartesia", display="Cartesia AI")
|
||||
assert p.display_name == "Cartesia AI"
|
||||
|
||||
def test_is_available_default_true(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
assert p.is_available() is True
|
||||
|
||||
def test_list_voices_default_empty(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
assert p.list_voices() == []
|
||||
|
||||
def test_list_models_default_empty(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
assert p.list_models() == []
|
||||
|
||||
def test_default_model_none_when_no_models(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
assert p.default_model() is None
|
||||
|
||||
def test_default_voice_none_when_no_voices(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
assert p.default_voice() is None
|
||||
|
||||
def test_default_model_first_listed(self):
|
||||
class WithModels(_FakeProvider):
|
||||
def list_models(self):
|
||||
return [{"id": "sonic-2"}, {"id": "sonic-1"}]
|
||||
|
||||
p = WithModels(name="cartesia")
|
||||
assert p.default_model() == "sonic-2"
|
||||
|
||||
def test_default_voice_first_listed(self):
|
||||
class WithVoices(_FakeProvider):
|
||||
def list_voices(self):
|
||||
return [{"id": "voice-aria"}, {"id": "voice-jasper"}]
|
||||
|
||||
p = WithVoices(name="cartesia")
|
||||
assert p.default_voice() == "voice-aria"
|
||||
|
||||
def test_get_setup_schema_default_minimal(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
schema = p.get_setup_schema()
|
||||
assert schema["name"] == "Cartesia"
|
||||
assert schema["env_vars"] == []
|
||||
|
||||
def test_stream_raises_not_implemented_by_default(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
with pytest.raises(NotImplementedError, match="does not implement streaming"):
|
||||
next(p.stream("hello"))
|
||||
|
||||
def test_voice_compatible_default_false(self):
|
||||
p = _FakeProvider(name="cartesia")
|
||||
assert p.voice_compatible is False
|
||||
|
||||
def test_voice_compatible_override(self):
|
||||
p = _FakeProvider(name="cartesia", voice_compat=True)
|
||||
assert p.voice_compatible is True
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestResolveOutputFormat:
|
||||
@pytest.mark.parametrize("valid", sorted(VALID_OUTPUT_FORMATS))
|
||||
def test_valid_passes_through(self, valid):
|
||||
assert resolve_output_format(valid) == valid
|
||||
|
||||
def test_uppercase_normalized(self):
|
||||
assert resolve_output_format("MP3") == "mp3"
|
||||
assert resolve_output_format("Opus") == "opus"
|
||||
|
||||
def test_whitespace_stripped(self):
|
||||
assert resolve_output_format(" wav ") == "wav"
|
||||
|
||||
def test_invalid_returns_default(self):
|
||||
assert resolve_output_format("aiff") == DEFAULT_OUTPUT_FORMAT
|
||||
assert resolve_output_format("") == DEFAULT_OUTPUT_FORMAT
|
||||
|
||||
def test_none_returns_default(self):
|
||||
assert resolve_output_format(None) == DEFAULT_OUTPUT_FORMAT
|
||||
|
||||
def test_non_string_returns_default(self):
|
||||
assert resolve_output_format(123) == DEFAULT_OUTPUT_FORMAT # type: ignore[arg-type]
|
||||
assert resolve_output_format([]) == DEFAULT_OUTPUT_FORMAT # type: ignore[arg-type]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Sync invariant: registry's built-in list vs dispatcher's built-in list
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBuiltinSync:
|
||||
"""``_BUILTIN_NAMES`` in agent/tts_registry.py is duplicated from
|
||||
``BUILTIN_TTS_PROVIDERS`` in tools/tts_tool.py (importing directly
|
||||
would create a circular dependency). This test fails loudly if the
|
||||
two lists drift — a new built-in added to tts_tool.py MUST also be
|
||||
added to tts_registry.py's _BUILTIN_NAMES or the registry will
|
||||
accept a name the dispatcher will silently route to the wrong
|
||||
handler.
|
||||
"""
|
||||
|
||||
def test_registry_builtins_match_dispatcher_builtins(self):
|
||||
from tools.tts_tool import BUILTIN_TTS_PROVIDERS
|
||||
|
||||
assert tts_registry._BUILTIN_NAMES == BUILTIN_TTS_PROVIDERS, (
|
||||
"agent.tts_registry._BUILTIN_NAMES and "
|
||||
"tools.tts_tool.BUILTIN_TTS_PROVIDERS have drifted!\n"
|
||||
f" Registry only: {sorted(tts_registry._BUILTIN_NAMES - BUILTIN_TTS_PROVIDERS)}\n"
|
||||
f" Dispatcher only: {sorted(BUILTIN_TTS_PROVIDERS - tts_registry._BUILTIN_NAMES)}\n"
|
||||
"Add the missing names to whichever list is incomplete. "
|
||||
"These two lists exist as a circular-import workaround and "
|
||||
"MUST be kept in sync manually."
|
||||
)
|
||||
156
tests/hermes_cli/test_plugins_tts_registration.py
Normal file
156
tests/hermes_cli/test_plugins_tts_registration.py
Normal file
|
|
@ -0,0 +1,156 @@
|
|||
"""Tests for PluginContext.register_tts_provider() (issue #30398).
|
||||
|
||||
Exercises the plugin context hook end-to-end: drops a fake plugin into
|
||||
``$HERMES_HOME/plugins/``, runs ``PluginManager().discover_and_load()``,
|
||||
and asserts the registration result.
|
||||
|
||||
Mirrors the structure of
|
||||
``tests/hermes_cli/test_plugin_scanner_recursion.py::TestRegisterImageGenProvider``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
def _write_plugin(
|
||||
root: Path,
|
||||
name: str,
|
||||
*,
|
||||
manifest_extra: Dict[str, Any] | None = None,
|
||||
register_body: str = "pass",
|
||||
) -> Path:
|
||||
plugin_dir = root / name
|
||||
plugin_dir.mkdir(parents=True, exist_ok=True)
|
||||
manifest = {
|
||||
"name": name,
|
||||
"version": "0.1.0",
|
||||
"description": f"Test plugin {name}",
|
||||
}
|
||||
if manifest_extra:
|
||||
manifest.update(manifest_extra)
|
||||
(plugin_dir / "plugin.yaml").write_text(yaml.dump(manifest))
|
||||
(plugin_dir / "__init__.py").write_text(
|
||||
f"def register(ctx):\n {register_body}\n"
|
||||
)
|
||||
return plugin_dir
|
||||
|
||||
|
||||
def _enable(hermes_home: Path, name: str) -> None:
|
||||
cfg_path = hermes_home / "config.yaml"
|
||||
cfg: dict = {}
|
||||
if cfg_path.exists():
|
||||
try:
|
||||
cfg = yaml.safe_load(cfg_path.read_text()) or {}
|
||||
except Exception:
|
||||
cfg = {}
|
||||
plugins_cfg = cfg.setdefault("plugins", {})
|
||||
enabled = plugins_cfg.setdefault("enabled", [])
|
||||
if isinstance(enabled, list) and name not in enabled:
|
||||
enabled.append(name)
|
||||
cfg_path.write_text(yaml.safe_dump(cfg))
|
||||
|
||||
|
||||
class TestRegisterTTSProvider:
|
||||
"""End-to-end: a fake plugin registers via the hook, ends up in the registry."""
|
||||
|
||||
def test_accepts_valid_provider(self):
|
||||
from hermes_cli.plugins import PluginManager
|
||||
|
||||
from agent import tts_registry
|
||||
tts_registry._reset_for_tests()
|
||||
|
||||
hermes_home = Path(os.environ["HERMES_HOME"])
|
||||
_write_plugin(
|
||||
hermes_home / "plugins",
|
||||
"my-tts-plugin",
|
||||
register_body=(
|
||||
"from agent.tts_provider import TTSProvider\n"
|
||||
" class P(TTSProvider):\n"
|
||||
" @property\n"
|
||||
" def name(self): return 'fake-tts'\n"
|
||||
" def synthesize(self, text, output_path, **kw):\n"
|
||||
" return output_path\n"
|
||||
" ctx.register_tts_provider(P())"
|
||||
),
|
||||
)
|
||||
_enable(hermes_home, "my-tts-plugin")
|
||||
|
||||
mgr = PluginManager()
|
||||
mgr.discover_and_load()
|
||||
|
||||
assert mgr._plugins["my-tts-plugin"].enabled is True, (
|
||||
f"Plugin failed to load: {mgr._plugins['my-tts-plugin'].error}"
|
||||
)
|
||||
assert tts_registry.get_provider("fake-tts") is not None
|
||||
|
||||
tts_registry._reset_for_tests()
|
||||
|
||||
def test_rejects_non_provider(self, caplog):
|
||||
"""A plugin that passes a non-TTSProvider gets a warning, no exception."""
|
||||
from hermes_cli.plugins import PluginManager
|
||||
|
||||
from agent import tts_registry
|
||||
tts_registry._reset_for_tests()
|
||||
|
||||
hermes_home = Path(os.environ["HERMES_HOME"])
|
||||
_write_plugin(
|
||||
hermes_home / "plugins",
|
||||
"bad-tts-plugin",
|
||||
register_body="ctx.register_tts_provider('not a provider')",
|
||||
)
|
||||
_enable(hermes_home, "bad-tts-plugin")
|
||||
|
||||
with caplog.at_level("WARNING"):
|
||||
mgr = PluginManager()
|
||||
mgr.discover_and_load()
|
||||
|
||||
# Plugin loaded (register returned normally), but registry empty.
|
||||
assert mgr._plugins["bad-tts-plugin"].enabled is True
|
||||
assert tts_registry.get_provider("not a provider") is None
|
||||
assert tts_registry.list_providers() == []
|
||||
assert "does not inherit from TTSProvider" in caplog.text
|
||||
|
||||
tts_registry._reset_for_tests()
|
||||
|
||||
def test_rejects_builtin_shadow(self, caplog):
|
||||
"""A plugin trying to register a name colliding with a built-in is silently
|
||||
rejected by the underlying registry — both with a registry-level warning
|
||||
AND with the registry remaining empty (plugin still loads OK).
|
||||
"""
|
||||
from hermes_cli.plugins import PluginManager
|
||||
|
||||
from agent import tts_registry
|
||||
tts_registry._reset_for_tests()
|
||||
|
||||
hermes_home = Path(os.environ["HERMES_HOME"])
|
||||
_write_plugin(
|
||||
hermes_home / "plugins",
|
||||
"shadow-tts-plugin",
|
||||
register_body=(
|
||||
"from agent.tts_provider import TTSProvider\n"
|
||||
" class P(TTSProvider):\n"
|
||||
" @property\n"
|
||||
" def name(self): return 'edge'\n"
|
||||
" def synthesize(self, text, output_path, **kw):\n"
|
||||
" return output_path\n"
|
||||
" ctx.register_tts_provider(P())"
|
||||
),
|
||||
)
|
||||
_enable(hermes_home, "shadow-tts-plugin")
|
||||
|
||||
with caplog.at_level("WARNING"):
|
||||
mgr = PluginManager()
|
||||
mgr.discover_and_load()
|
||||
|
||||
# Plugin still loaded normally — built-in shadowing is a warning,
|
||||
# not an exception. The registry rejects the entry though.
|
||||
assert mgr._plugins["shadow-tts-plugin"].enabled is True
|
||||
assert tts_registry.get_provider("edge") is None
|
||||
assert "shadows a built-in name" in caplog.text
|
||||
|
||||
tts_registry._reset_for_tests()
|
||||
187
tests/hermes_cli/test_tts_picker.py
Normal file
187
tests/hermes_cli/test_tts_picker.py
Normal file
|
|
@ -0,0 +1,187 @@
|
|||
"""Tests for the TTS plugin picker surface in hermes_cli/tools_config.py (issue #30398).
|
||||
|
||||
Covers ``_plugin_tts_providers()`` and the ``_visible_providers()``
|
||||
integration that injects plugin rows into the Text-to-Speech category.
|
||||
|
||||
Mirrors the structure of existing image_gen / browser picker tests.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
|
||||
from agent import tts_registry
|
||||
from agent.tts_provider import TTSProvider
|
||||
from hermes_cli import tools_config
|
||||
|
||||
|
||||
class _FakeTTSProvider(TTSProvider):
|
||||
def __init__(self, name: str, schema: dict | None = None):
|
||||
self._name = name
|
||||
self._schema = schema
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return self._name
|
||||
|
||||
def synthesize(self, text, output_path, **kw):
|
||||
return output_path
|
||||
|
||||
def get_setup_schema(self):
|
||||
if self._schema is not None:
|
||||
return self._schema
|
||||
return super().get_setup_schema()
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _reset_registry():
|
||||
tts_registry._reset_for_tests()
|
||||
yield
|
||||
tts_registry._reset_for_tests()
|
||||
|
||||
|
||||
class TestPluginTTSProviders:
|
||||
"""``_plugin_tts_providers()`` returns picker-row dicts."""
|
||||
|
||||
def test_empty_when_no_plugins(self):
|
||||
assert tools_config._plugin_tts_providers() == []
|
||||
|
||||
def test_returns_row_for_registered_plugin(self):
|
||||
tts_registry.register_provider(
|
||||
_FakeTTSProvider(
|
||||
name="cartesia",
|
||||
schema={
|
||||
"name": "Cartesia",
|
||||
"badge": "paid",
|
||||
"tag": "Ultra-low-latency streaming",
|
||||
"env_vars": [
|
||||
{"key": "CARTESIA_API_KEY", "prompt": "Cartesia API key",
|
||||
"url": "https://play.cartesia.ai/console"},
|
||||
],
|
||||
},
|
||||
)
|
||||
)
|
||||
rows = tools_config._plugin_tts_providers()
|
||||
assert len(rows) == 1
|
||||
row = rows[0]
|
||||
assert row["name"] == "Cartesia"
|
||||
assert row["badge"] == "paid"
|
||||
assert row["tag"] == "Ultra-low-latency streaming"
|
||||
assert row["env_vars"][0]["key"] == "CARTESIA_API_KEY"
|
||||
# Selecting this row writes ``tts.provider: cartesia`` — same
|
||||
# write path as a hardcoded row.
|
||||
assert row["tts_provider"] == "cartesia"
|
||||
assert row["tts_plugin_name"] == "cartesia"
|
||||
|
||||
def test_filters_builtin_shadow_defensively(self):
|
||||
"""Even if a plugin slipped past the registry's built-in check
|
||||
(e.g. via direct ``agent.tts_registry.register_provider`` rather
|
||||
than the ``ctx.register_tts_provider`` hook), the picker layer
|
||||
filters it out so the picker invariant holds."""
|
||||
# Use lower-level call to bypass the warning + skip in
|
||||
# register_provider (the registry's built-in guard).
|
||||
# Note: this is intentionally pathological — production code
|
||||
# paths go through the hook which catches this first.
|
||||
provider = _FakeTTSProvider(name="edge")
|
||||
tts_registry._providers["edge"] = provider # type: ignore[index]
|
||||
try:
|
||||
rows = tools_config._plugin_tts_providers()
|
||||
assert rows == [], (
|
||||
"Picker must filter built-in name shadows even when the "
|
||||
"registry has been bypassed."
|
||||
)
|
||||
finally:
|
||||
tts_registry._providers.pop("edge", None) # type: ignore[arg-type]
|
||||
|
||||
def test_skips_providers_with_no_name(self):
|
||||
"""Defense in depth: a provider with no .name attribute is skipped
|
||||
rather than crashing the picker."""
|
||||
|
||||
class _NoName:
|
||||
display_name = "Bogus"
|
||||
def get_setup_schema(self):
|
||||
return {"name": "Bogus"}
|
||||
|
||||
tts_registry._providers["bogus"] = _NoName() # type: ignore[assignment]
|
||||
try:
|
||||
rows = tools_config._plugin_tts_providers()
|
||||
# Provider has no .name so the picker filters it out
|
||||
assert all(r.get("tts_plugin_name") != "bogus" for r in rows)
|
||||
finally:
|
||||
tts_registry._providers.pop("bogus", None) # type: ignore[arg-type]
|
||||
|
||||
def test_skips_providers_whose_schema_raises(self):
|
||||
class _ExplodingSchema(_FakeTTSProvider):
|
||||
def get_setup_schema(self):
|
||||
raise RuntimeError("boom")
|
||||
|
||||
tts_registry.register_provider(_ExplodingSchema(name="exploding"))
|
||||
tts_registry.register_provider(_FakeTTSProvider(name="working"))
|
||||
rows = tools_config._plugin_tts_providers()
|
||||
assert [r["tts_plugin_name"] for r in rows] == ["working"]
|
||||
|
||||
def test_minimal_schema_uses_display_name(self):
|
||||
"""A provider with no setup_schema override gets a row built from
|
||||
``display_name`` and ``name`` only."""
|
||||
tts_registry.register_provider(_FakeTTSProvider(name="minimal"))
|
||||
rows = tools_config._plugin_tts_providers()
|
||||
assert len(rows) == 1
|
||||
assert rows[0]["name"] == "Minimal" # display_name default
|
||||
assert rows[0]["tts_provider"] == "minimal"
|
||||
assert rows[0]["env_vars"] == []
|
||||
|
||||
def test_post_setup_passthrough(self):
|
||||
tts_registry.register_provider(
|
||||
_FakeTTSProvider(
|
||||
name="my-tts",
|
||||
schema={
|
||||
"name": "My TTS",
|
||||
"post_setup": "my_post_install_hook",
|
||||
"env_vars": [],
|
||||
},
|
||||
)
|
||||
)
|
||||
rows = tools_config._plugin_tts_providers()
|
||||
assert rows[0].get("post_setup") == "my_post_install_hook"
|
||||
|
||||
|
||||
class TestVisibleProvidersInjectsTTSPlugins:
|
||||
"""``_visible_providers()`` injects plugin rows into the Text-to-Speech
|
||||
category alongside the hardcoded built-in rows."""
|
||||
|
||||
def test_tts_category_includes_plugin_rows(self):
|
||||
tts_registry.register_provider(_FakeTTSProvider(name="cartesia"))
|
||||
|
||||
tts_cat = tools_config.TOOL_CATEGORIES["tts"]
|
||||
visible = tools_config._visible_providers(tts_cat, config={})
|
||||
|
||||
names = [row.get("name") for row in visible]
|
||||
# Hardcoded rows (sample — check at least one is present)
|
||||
assert "Microsoft Edge TTS" in names
|
||||
# Plugin row injected at the end
|
||||
assert "Cartesia" in names
|
||||
|
||||
# Plugin row has tts_provider key for write-path compat
|
||||
plugin_rows = [r for r in visible if r.get("tts_plugin_name")]
|
||||
assert len(plugin_rows) == 1
|
||||
assert plugin_rows[0]["tts_provider"] == "cartesia"
|
||||
|
||||
def test_other_categories_unaffected_by_tts_plugins(self):
|
||||
"""Registering a TTS plugin must not leak into the Image Generation
|
||||
or Browser pickers."""
|
||||
tts_registry.register_provider(_FakeTTSProvider(name="cartesia"))
|
||||
|
||||
img_cat = tools_config.TOOL_CATEGORIES["image_gen"]
|
||||
visible = tools_config._visible_providers(img_cat, config={})
|
||||
names = [row.get("name") for row in visible]
|
||||
assert "Cartesia" not in names
|
||||
|
||||
def test_tts_category_without_plugins_only_hardcoded(self):
|
||||
"""No plugins → picker shows exactly the hardcoded rows."""
|
||||
tts_cat = tools_config.TOOL_CATEGORIES["tts"]
|
||||
visible = tools_config._visible_providers(tts_cat, config={})
|
||||
names = [row.get("name") for row in visible]
|
||||
# No row has the plugin marker
|
||||
assert all(not row.get("tts_plugin_name") for row in visible)
|
||||
# Hardcoded rows still present (sample one of the always-visible ones)
|
||||
assert "Microsoft Edge TTS" in names
|
||||
0
tests/plugins/tts/__init__.py
Normal file
0
tests/plugins/tts/__init__.py
Normal file
328
tests/plugins/tts/check_parity_vs_main.py
Normal file
328
tests/plugins/tts/check_parity_vs_main.py
Normal file
|
|
@ -0,0 +1,328 @@
|
|||
"""Behavior-parity check for the TTS plugin hook (issue #30398).
|
||||
|
||||
Spawns one subprocess per (version, scenario) cell — pinned to either
|
||||
``origin/main`` (no plugin hook; ``tts.provider: cartesia`` falls
|
||||
through to the Edge TTS default branch) or this PR's worktree (plugin
|
||||
hook present; same config routes through the plugin registry when a
|
||||
plugin is registered).
|
||||
|
||||
Each subprocess clears all TTS-related env vars + writes a
|
||||
``config.yaml``, then resolves how the dispatcher would route a
|
||||
``text_to_speech`` call. The emitted shape tuple is::
|
||||
|
||||
{dispatch_kind, provider_name, voice_compat}
|
||||
|
||||
Where ``dispatch_kind`` ∈
|
||||
``{"builtin_edge", "builtin_openai", "builtin_elevenlabs", ...,
|
||||
"command", "plugin", "fallback_edge", "error"}``:
|
||||
|
||||
* ``builtin_<name>`` — config selects a built-in handler that exists
|
||||
on both main and PR (no diff expected)
|
||||
* ``command`` — config selects a ``tts.providers.<name>: type: command``
|
||||
entry (PR #17843; no diff expected)
|
||||
* ``plugin`` — config selects a plugin-registered provider (PR only)
|
||||
* ``fallback_edge`` — config selects an unknown name with no matching
|
||||
plugin or command entry → Edge TTS default fallback
|
||||
* ``error`` — explicit fatal error (e.g. mistral quarantine)
|
||||
|
||||
The parent process diffs the reduced shape per scenario. The only
|
||||
acceptable diff is ``fallback_edge → plugin`` for the
|
||||
``unknown-name-with-plugin-installed`` scenario — everything else is
|
||||
a regression.
|
||||
|
||||
Run from the PR worktree (it auto-resolves ``MAIN_DIR`` from the parent
|
||||
of the worktree directory, or falls back to a sibling
|
||||
``hermes-agent-main`` checkout)::
|
||||
|
||||
python tests/plugins/tts/check_parity_vs_main.py
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parents[3]
|
||||
|
||||
|
||||
def _resolve_main_dir() -> Path:
|
||||
candidate = REPO_ROOT.parent.parent
|
||||
if (candidate / "tools" / "tts_tool.py").exists() and candidate != REPO_ROOT:
|
||||
return candidate
|
||||
sibling = REPO_ROOT.parent / "hermes-agent-main"
|
||||
if (sibling / "tools" / "tts_tool.py").exists():
|
||||
return sibling
|
||||
return REPO_ROOT
|
||||
|
||||
|
||||
MAIN_DIR = _resolve_main_dir()
|
||||
PR_DIR = REPO_ROOT
|
||||
assert (PR_DIR / "tools" / "tts_tool.py").exists(), (
|
||||
f"PR_DIR={PR_DIR} doesn't look like a hermes-agent checkout"
|
||||
)
|
||||
|
||||
|
||||
# The subprocess script — runs INSIDE either the main checkout or PR
|
||||
# checkout, so the import paths resolve to the version of the code
|
||||
# under test. We never call the real ``text_to_speech_tool`` because
|
||||
# that would require audio synthesis; instead we ask the resolution
|
||||
# layer what it WOULD do.
|
||||
SUBPROCESS_SCRIPT = r"""
|
||||
import json, os, sys, tempfile
|
||||
sys.path.insert(0, sys.argv[1])
|
||||
|
||||
# Isolated HERMES_HOME so the config write is hermetic.
|
||||
home = tempfile.mkdtemp()
|
||||
os.environ["HERMES_HOME"] = home
|
||||
|
||||
# Clear TTS-related env so dispatch decisions are config-driven.
|
||||
for k in (
|
||||
"ELEVENLABS_API_KEY", "OPENAI_API_KEY", "VOICE_TOOLS_OPENAI_KEY",
|
||||
"MINIMAX_API_KEY", "XAI_API_KEY", "GEMINI_API_KEY",
|
||||
):
|
||||
os.environ.pop(k, None)
|
||||
|
||||
scenario_env = json.loads(sys.argv[2])
|
||||
os.environ.update(scenario_env)
|
||||
|
||||
config_yaml = sys.argv[3]
|
||||
plugin_register = sys.argv[4] # "yes" to register a fake plugin
|
||||
|
||||
config_path = os.path.join(home, "config.yaml")
|
||||
with open(config_path, "w") as f:
|
||||
f.write(config_yaml)
|
||||
|
||||
# Fresh import — must not have anything cached from prior runs.
|
||||
for name in list(sys.modules):
|
||||
if (name.startswith("tools.")
|
||||
or name.startswith("agent.")
|
||||
or name.startswith("plugins.")
|
||||
or name.startswith("hermes_cli.")):
|
||||
sys.modules.pop(name, None)
|
||||
|
||||
# Try importing tts_registry — only exists on PR side.
|
||||
have_plugin_hook = False
|
||||
try:
|
||||
from agent import tts_registry
|
||||
from agent.tts_provider import TTSProvider
|
||||
have_plugin_hook = True
|
||||
|
||||
if plugin_register == "yes":
|
||||
class _FakeProvider(TTSProvider):
|
||||
@property
|
||||
def name(self): return "cartesia"
|
||||
def synthesize(self, text, output_path, **kw):
|
||||
return output_path
|
||||
|
||||
tts_registry._reset_for_tests()
|
||||
tts_registry.register_provider(_FakeProvider())
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
import tools.tts_tool as tts_tool
|
||||
|
||||
# Read the config the same way text_to_speech_tool() does.
|
||||
tts_config = tts_tool._load_tts_config()
|
||||
provider = tts_tool._get_provider(tts_config)
|
||||
|
||||
dispatch_kind = None
|
||||
provider_name = provider
|
||||
voice_compat = False
|
||||
error_text = None
|
||||
|
||||
try:
|
||||
# Mistral is the one branch that returns a fatal error.
|
||||
if provider == "mistral":
|
||||
dispatch_kind = "error"
|
||||
error_text = "mistral quarantine"
|
||||
elif tts_tool._resolve_command_provider_config(provider, tts_config) is not None:
|
||||
dispatch_kind = "command"
|
||||
elif have_plugin_hook and provider not in tts_tool.BUILTIN_TTS_PROVIDERS:
|
||||
# On PR side: check plugin dispatch.
|
||||
plugin_path = tts_tool._dispatch_to_plugin_provider(
|
||||
"test", os.path.join(home, "out.mp3"), provider, tts_config,
|
||||
)
|
||||
if plugin_path is not None:
|
||||
dispatch_kind = "plugin"
|
||||
voice_compat = tts_tool._plugin_provider_is_voice_compatible(provider)
|
||||
else:
|
||||
# Falls through to Edge TTS default on the PR side too.
|
||||
dispatch_kind = "fallback_edge"
|
||||
elif provider in tts_tool.BUILTIN_TTS_PROVIDERS:
|
||||
dispatch_kind = "builtin_" + provider
|
||||
else:
|
||||
# On main side: unknown names fall through to Edge default.
|
||||
dispatch_kind = "fallback_edge"
|
||||
except Exception as exc:
|
||||
dispatch_kind = "exception"
|
||||
error_text = repr(exc)
|
||||
|
||||
shape = {
|
||||
"dispatch_kind": dispatch_kind,
|
||||
"provider_name": provider_name,
|
||||
"voice_compat": bool(voice_compat),
|
||||
"error_present": error_text is not None,
|
||||
}
|
||||
print(json.dumps(shape))
|
||||
"""
|
||||
|
||||
|
||||
SCENARIOS: list[tuple[str, str, dict[str, str], str]] = [
|
||||
# (label, config.yaml body, scenario_env, plugin_register)
|
||||
|
||||
# Scenario 1: unset tts.provider → both: Edge default
|
||||
("unset-defaults-to-edge", "", {}, "no"),
|
||||
|
||||
# Scenario 2: built-in name → both: that built-in
|
||||
("explicit-edge", "tts:\n provider: edge\n", {}, "no"),
|
||||
("explicit-openai", "tts:\n provider: openai\n", {}, "no"),
|
||||
("explicit-elevenlabs", "tts:\n provider: elevenlabs\n", {}, "no"),
|
||||
|
||||
# Scenario 3: command-type provider → both: command dispatch
|
||||
(
|
||||
"command-provider",
|
||||
"tts:\n provider: my-piper\n providers:\n my-piper:\n type: command\n command: 'piper -m model.onnx -f {output_path} < {input_path}'\n",
|
||||
{},
|
||||
"no",
|
||||
),
|
||||
|
||||
# Scenario 4: unknown name with NO plugin installed → both: fallback to Edge
|
||||
("unknown-no-plugin", "tts:\n provider: cartesia\n", {}, "no"),
|
||||
|
||||
# Scenario 5: unknown name WITH plugin installed
|
||||
# main: fallback_edge (no plugin hook exists)
|
||||
# PR: plugin (cartesia)
|
||||
# This is the ONLY acceptable diff in the harness.
|
||||
("plugin-installed", "tts:\n provider: cartesia\n", {}, "yes"),
|
||||
|
||||
# Scenario 6: built-in name + plugin tries to shadow → both: built-in
|
||||
# The plugin registers under name "cartesia", not "edge", so this is
|
||||
# effectively the same as scenario 2 — but we exercise the with-plugin
|
||||
# path to ensure the built-in branch still takes priority.
|
||||
("explicit-edge-with-plugin-registered", "tts:\n provider: edge\n", {}, "yes"),
|
||||
|
||||
# Scenario 7: mistral quarantine — both surface the explicit error
|
||||
("mistral-quarantine", "tts:\n provider: mistral\n", {}, "no"),
|
||||
]
|
||||
|
||||
|
||||
def _run_scenario(repo_path: Path, label: str, config_yaml: str, env: dict, plugin_register: str) -> dict:
|
||||
venv_python = repo_path / ".venv" / "bin" / "python"
|
||||
if not venv_python.exists():
|
||||
venv_python = MAIN_DIR / ".venv" / "bin" / "python"
|
||||
if not venv_python.exists():
|
||||
venv_python = MAIN_DIR / "venv" / "bin" / "python"
|
||||
if not venv_python.exists():
|
||||
venv_python = Path("python3")
|
||||
|
||||
out = subprocess.run(
|
||||
[
|
||||
str(venv_python),
|
||||
"-c",
|
||||
SUBPROCESS_SCRIPT,
|
||||
str(repo_path),
|
||||
json.dumps(env),
|
||||
config_yaml,
|
||||
plugin_register,
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=60,
|
||||
)
|
||||
if out.returncode != 0:
|
||||
return {
|
||||
"error": "subprocess failed",
|
||||
"stdout": out.stdout[-500:],
|
||||
"stderr": out.stderr[-500:],
|
||||
}
|
||||
try:
|
||||
return json.loads(out.stdout.strip().splitlines()[-1])
|
||||
except Exception as exc:
|
||||
return {"error": f"could not parse output: {exc}", "stdout": out.stdout}
|
||||
|
||||
|
||||
def _reduce(shape: dict) -> dict:
|
||||
"""Reduce to the parts that matter for user-visible parity."""
|
||||
return {
|
||||
"dispatch_kind": shape.get("dispatch_kind"),
|
||||
"provider_name": shape.get("provider_name"),
|
||||
"error_present": shape.get("error_present"),
|
||||
}
|
||||
|
||||
|
||||
def main() -> int:
|
||||
print(f"main: {MAIN_DIR}")
|
||||
print(f"pr: {PR_DIR}")
|
||||
print()
|
||||
|
||||
if MAIN_DIR == PR_DIR:
|
||||
print(
|
||||
"WARN: MAIN_DIR == PR_DIR — diffs will be trivially identical.\n"
|
||||
" Set up a sibling 'hermes-agent-main' checkout pinned to "
|
||||
"origin/main to get real parity coverage."
|
||||
)
|
||||
print()
|
||||
|
||||
failures: list[str] = []
|
||||
errors: list[str] = []
|
||||
intentional_diffs: list[tuple[str, dict, dict]] = []
|
||||
for label, config_yaml, env, plugin_register in SCENARIOS:
|
||||
main_shape = _run_scenario(MAIN_DIR, label, config_yaml, env, plugin_register)
|
||||
pr_shape = _run_scenario(PR_DIR, label, config_yaml, env, plugin_register)
|
||||
|
||||
if "error" in main_shape or "error" in pr_shape:
|
||||
print(f" [ERR ] {label}: subprocess failed")
|
||||
print(f" main: {main_shape}")
|
||||
print(f" pr: {pr_shape}")
|
||||
errors.append(label)
|
||||
continue
|
||||
|
||||
main_reduced = _reduce(main_shape)
|
||||
pr_reduced = _reduce(pr_shape)
|
||||
|
||||
if main_reduced == pr_reduced:
|
||||
print(f" [OK] {label}: {main_reduced}")
|
||||
continue
|
||||
|
||||
# On main, "plugin-installed" scenario returns fallback_edge
|
||||
# (no plugin hook); on PR, it routes to the plugin. That's the
|
||||
# only acceptable diff.
|
||||
fallback_to_plugin = (
|
||||
main_reduced.get("dispatch_kind") == "fallback_edge"
|
||||
and pr_reduced.get("dispatch_kind") == "plugin"
|
||||
and label == "plugin-installed"
|
||||
)
|
||||
if fallback_to_plugin:
|
||||
print(f" [DIFF] {label}: fallback_edge → plugin — expected")
|
||||
intentional_diffs.append((label, main_reduced, pr_reduced))
|
||||
else:
|
||||
print(f" [FAIL] {label}")
|
||||
print(f" main: {main_reduced}")
|
||||
print(f" pr: {pr_reduced}")
|
||||
failures.append(label)
|
||||
|
||||
print()
|
||||
if errors:
|
||||
print(f"SUBPROCESS ERRORS in {len(errors)} scenario(s):")
|
||||
for e in errors:
|
||||
print(f" - {e}")
|
||||
if failures:
|
||||
print(f"BEHAVIOUR REGRESSION in {len(failures)} scenario(s):")
|
||||
for f in failures:
|
||||
print(f" - {f}")
|
||||
if intentional_diffs:
|
||||
print(
|
||||
f"INTENTIONAL DIFFS ({len(intentional_diffs)}): "
|
||||
f"fallback_edge → plugin dispatch when a plugin is registered."
|
||||
)
|
||||
if failures or errors:
|
||||
return 1
|
||||
print(f"PARITY OK across {len(SCENARIOS)} scenarios.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
323
tests/tools/test_tts_plugin_dispatch.py
Normal file
323
tests/tools/test_tts_plugin_dispatch.py
Normal file
|
|
@ -0,0 +1,323 @@
|
|||
"""Tests for TTS plugin dispatch in tools/tts_tool.py (issue #30398).
|
||||
|
||||
Covers the three core invariants of the plugin dispatcher:
|
||||
|
||||
1. Built-in provider names short-circuit — plugins NEVER win over a
|
||||
built-in. Even if a plugin somehow ended up in the registry with a
|
||||
built-in name (which the registry already blocks), the dispatcher
|
||||
re-checks defensively.
|
||||
2. Command-type providers declared under ``tts.providers.<name>: type:
|
||||
command`` (PR #17843) win over a plugin with the same name. Config
|
||||
is more local than plugin install.
|
||||
3. Plugin dispatch fires only when the configured provider is neither
|
||||
a built-in nor a command-type entry, AND a plugin is registered
|
||||
under that name. Unknown names fall through.
|
||||
|
||||
Also exercises:
|
||||
- Plugin exceptions surface to the outer error envelope (don't crash)
|
||||
- Plugin returning a different path is honored
|
||||
- voice_compatible: True triggers ffmpeg opus conversion path
|
||||
- voice_compatible: False keeps the file as-is
|
||||
|
||||
The dispatcher is exercised in isolation — we don't actually call
|
||||
``text_to_speech_tool`` because that would require real audio file
|
||||
writes. Each test directly calls
|
||||
``tools.tts_tool._dispatch_to_plugin_provider`` / the predicate
|
||||
helpers.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Optional
|
||||
|
||||
import pytest
|
||||
|
||||
from agent import tts_registry
|
||||
from agent.tts_provider import TTSProvider
|
||||
from tools import tts_tool
|
||||
|
||||
|
||||
class _FakeTTSProvider(TTSProvider):
|
||||
def __init__(
|
||||
self,
|
||||
name: str,
|
||||
voice_compat: bool = False,
|
||||
raise_exc: Optional[BaseException] = None,
|
||||
return_path: Optional[str] = None,
|
||||
):
|
||||
self._name = name
|
||||
self._voice_compat = voice_compat
|
||||
self._raise_exc = raise_exc
|
||||
self._return_path = return_path
|
||||
# Recorded for assertions
|
||||
self.last_call: Optional[dict] = None
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return self._name
|
||||
|
||||
@property
|
||||
def voice_compatible(self) -> bool:
|
||||
return self._voice_compat
|
||||
|
||||
def synthesize(self, text, output_path, **kw):
|
||||
self.last_call = {
|
||||
"text": text,
|
||||
"output_path": output_path,
|
||||
"kwargs": dict(kw),
|
||||
}
|
||||
if self._raise_exc is not None:
|
||||
raise self._raise_exc
|
||||
return self._return_path if self._return_path is not None else output_path
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _reset_registry():
|
||||
tts_registry._reset_for_tests()
|
||||
yield
|
||||
tts_registry._reset_for_tests()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Resolution invariants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBuiltinAlwaysWins:
|
||||
"""Built-in TTS provider names short-circuit the dispatcher.
|
||||
|
||||
Even with a plugin registered (which the registry would reject —
|
||||
but the dispatcher is defensive), built-in names return None so
|
||||
the caller's elif chain handles them natively.
|
||||
"""
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"builtin",
|
||||
["edge", "openai", "elevenlabs", "minimax", "gemini",
|
||||
"mistral", "xai", "piper", "kittentts", "neutts"],
|
||||
)
|
||||
def test_dispatcher_short_circuits_builtin(self, builtin):
|
||||
result = tts_tool._dispatch_to_plugin_provider(
|
||||
text="hello",
|
||||
output_path="/tmp/out.mp3",
|
||||
provider=builtin,
|
||||
tts_config={},
|
||||
)
|
||||
assert result is None, (
|
||||
f"Built-in {builtin!r} must short-circuit plugin dispatch. "
|
||||
"If this test fails, the dispatcher would silently let a "
|
||||
"plugin with a built-in name shadow the native handler — "
|
||||
"violating the precedence rule from PR #17843."
|
||||
)
|
||||
|
||||
def test_dispatcher_short_circuits_builtin_case_insensitive(self):
|
||||
for variant in ("EDGE", "Edge", " edge ", "eDgE"):
|
||||
assert (
|
||||
tts_tool._dispatch_to_plugin_provider(
|
||||
text="hello", output_path="/tmp/x.mp3",
|
||||
provider=variant, tts_config={},
|
||||
) is None
|
||||
)
|
||||
|
||||
|
||||
class TestCommandProviderWins:
|
||||
"""A same-name ``tts.providers.<name>: type: command`` config beats a plugin.
|
||||
|
||||
Locality: a user's command-provider config is more specific than
|
||||
whichever plugin happens to be installed.
|
||||
"""
|
||||
|
||||
def test_command_config_beats_plugin(self):
|
||||
tts_registry.register_provider(_FakeTTSProvider(name="my-tts"))
|
||||
|
||||
result = tts_tool._dispatch_to_plugin_provider(
|
||||
text="hello",
|
||||
output_path="/tmp/out.mp3",
|
||||
provider="my-tts",
|
||||
tts_config={
|
||||
"providers": {
|
||||
"my-tts": {
|
||||
"type": "command",
|
||||
"command": "echo 'hi' > {output_path}",
|
||||
},
|
||||
},
|
||||
},
|
||||
)
|
||||
# Plugin path returns None → caller falls back to command
|
||||
# provider dispatch (handled by the outer text_to_speech_tool
|
||||
# via _resolve_command_provider_config).
|
||||
assert result is None
|
||||
|
||||
|
||||
class TestPluginDispatch:
|
||||
"""Happy path: configured name matches a registered plugin, dispatcher fires."""
|
||||
|
||||
def test_registered_plugin_called(self):
|
||||
provider = _FakeTTSProvider(name="cartesia")
|
||||
tts_registry.register_provider(provider)
|
||||
|
||||
result = tts_tool._dispatch_to_plugin_provider(
|
||||
text="hello world",
|
||||
output_path="/tmp/out.mp3",
|
||||
provider="cartesia",
|
||||
tts_config={},
|
||||
)
|
||||
assert result == "/tmp/out.mp3"
|
||||
assert provider.last_call is not None
|
||||
assert provider.last_call["text"] == "hello world"
|
||||
assert provider.last_call["output_path"] == "/tmp/out.mp3"
|
||||
|
||||
def test_unregistered_name_returns_none(self):
|
||||
result = tts_tool._dispatch_to_plugin_provider(
|
||||
text="hello",
|
||||
output_path="/tmp/out.mp3",
|
||||
provider="unknown-tts",
|
||||
tts_config={},
|
||||
)
|
||||
assert result is None
|
||||
|
||||
def test_voice_model_speed_format_forwarded(self):
|
||||
provider = _FakeTTSProvider(name="cartesia")
|
||||
tts_registry.register_provider(provider)
|
||||
|
||||
result = tts_tool._dispatch_to_plugin_provider(
|
||||
text="hello",
|
||||
output_path="/tmp/out.opus",
|
||||
provider="cartesia",
|
||||
tts_config={
|
||||
"voice": "voice-aria",
|
||||
"model": "sonic-2",
|
||||
"speed": 1.2,
|
||||
"output_format": "opus",
|
||||
},
|
||||
)
|
||||
assert result == "/tmp/out.opus"
|
||||
kwargs = provider.last_call["kwargs"]
|
||||
assert kwargs["voice"] == "voice-aria"
|
||||
assert kwargs["model"] == "sonic-2"
|
||||
assert kwargs["speed"] == 1.2
|
||||
assert kwargs["format"] == "opus"
|
||||
|
||||
def test_empty_string_voice_passed_as_none(self):
|
||||
"""Empty-string config values are normalized to None so providers can
|
||||
fall back to their own defaults (matches the ABC contract)."""
|
||||
provider = _FakeTTSProvider(name="cartesia")
|
||||
tts_registry.register_provider(provider)
|
||||
|
||||
tts_tool._dispatch_to_plugin_provider(
|
||||
text="hello",
|
||||
output_path="/tmp/out.mp3",
|
||||
provider="cartesia",
|
||||
tts_config={"voice": "", "model": ""},
|
||||
)
|
||||
kwargs = provider.last_call["kwargs"]
|
||||
assert kwargs["voice"] is None
|
||||
assert kwargs["model"] is None
|
||||
|
||||
def test_provider_returning_different_path_honored(self):
|
||||
"""If a provider rewrites the output path (e.g. format-driven extension
|
||||
change), the dispatcher returns the new path."""
|
||||
provider = _FakeTTSProvider(name="cartesia", return_path="/tmp/rewritten.opus")
|
||||
tts_registry.register_provider(provider)
|
||||
|
||||
result = tts_tool._dispatch_to_plugin_provider(
|
||||
text="hi",
|
||||
output_path="/tmp/out.mp3",
|
||||
provider="cartesia",
|
||||
tts_config={},
|
||||
)
|
||||
assert result == "/tmp/rewritten.opus"
|
||||
|
||||
def test_provider_returning_none_falls_back_to_output_path(self):
|
||||
"""Defensive: a provider returning None means the dispatcher should
|
||||
report the caller-supplied output_path (matches the ABC contract — the
|
||||
provider is supposed to write to output_path)."""
|
||||
provider = _FakeTTSProvider(name="cartesia", return_path=None)
|
||||
# Override the default-output-path behavior to return None explicitly
|
||||
provider._return_path = None
|
||||
|
||||
class _ReturnsNone(_FakeTTSProvider):
|
||||
def synthesize(self, text, output_path, **kw):
|
||||
return None # type: ignore[return-value]
|
||||
|
||||
provider2 = _ReturnsNone(name="weird")
|
||||
tts_registry.register_provider(provider2)
|
||||
|
||||
result = tts_tool._dispatch_to_plugin_provider(
|
||||
text="hi",
|
||||
output_path="/tmp/out.mp3",
|
||||
provider="weird",
|
||||
tts_config={},
|
||||
)
|
||||
assert result == "/tmp/out.mp3"
|
||||
|
||||
def test_provider_exception_bubbles_up(self):
|
||||
"""Plugin exceptions are NOT swallowed by the dispatcher — they bubble
|
||||
up so the outer ``text_to_speech_tool`` try/except converts them to
|
||||
the standard error envelope. Matches command-provider failure
|
||||
behavior."""
|
||||
provider = _FakeTTSProvider(
|
||||
name="cartesia",
|
||||
raise_exc=RuntimeError("network down"),
|
||||
)
|
||||
tts_registry.register_provider(provider)
|
||||
|
||||
with pytest.raises(RuntimeError, match="network down"):
|
||||
tts_tool._dispatch_to_plugin_provider(
|
||||
text="hi",
|
||||
output_path="/tmp/out.mp3",
|
||||
provider="cartesia",
|
||||
tts_config={},
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# voice_compatible flag
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestVoiceCompatibleHelper:
|
||||
def test_voice_compatible_true(self):
|
||||
tts_registry.register_provider(
|
||||
_FakeTTSProvider(name="cartesia", voice_compat=True)
|
||||
)
|
||||
assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is True
|
||||
|
||||
def test_voice_compatible_false_by_default(self):
|
||||
tts_registry.register_provider(_FakeTTSProvider(name="cartesia"))
|
||||
assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is False
|
||||
|
||||
def test_unregistered_provider_returns_false(self):
|
||||
assert tts_tool._plugin_provider_is_voice_compatible("unknown") is False
|
||||
|
||||
def test_empty_provider_name_returns_false(self):
|
||||
assert tts_tool._plugin_provider_is_voice_compatible("") is False
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"builtin",
|
||||
["edge", "openai", "elevenlabs", "minimax", "gemini",
|
||||
"mistral", "xai", "piper", "kittentts", "neutts"],
|
||||
)
|
||||
def test_builtin_names_return_false(self, builtin):
|
||||
"""voice_compatible helper short-circuits built-ins so they go
|
||||
through the legacy code path that handles their format quirks."""
|
||||
assert tts_tool._plugin_provider_is_voice_compatible(builtin) is False
|
||||
|
||||
def test_voice_compatible_case_insensitive(self):
|
||||
tts_registry.register_provider(
|
||||
_FakeTTSProvider(name="cartesia", voice_compat=True)
|
||||
)
|
||||
assert tts_tool._plugin_provider_is_voice_compatible("CARTESIA") is True
|
||||
assert tts_tool._plugin_provider_is_voice_compatible(" cartesia ") is True
|
||||
|
||||
def test_provider_property_exception_returns_false(self):
|
||||
"""A buggy ``voice_compatible`` property raising must not crash the
|
||||
TTS pipeline."""
|
||||
|
||||
class _ExplodingProvider(_FakeTTSProvider):
|
||||
@property
|
||||
def voice_compatible(self) -> bool:
|
||||
raise RuntimeError("boom")
|
||||
|
||||
tts_registry.register_provider(_ExplodingProvider(name="cartesia"))
|
||||
assert tts_tool._plugin_provider_is_voice_compatible("cartesia") is False
|
||||
|
|
@ -419,6 +419,123 @@ def _resolve_command_provider_config(
|
|||
return None
|
||||
|
||||
|
||||
def _dispatch_to_plugin_provider(
|
||||
text: str,
|
||||
output_path: str,
|
||||
provider: str,
|
||||
tts_config: Dict[str, Any],
|
||||
) -> Optional[str]:
|
||||
"""Route the call to a plugin-registered TTS provider, or return None.
|
||||
|
||||
Returns the path to the written audio file on dispatch, or ``None``
|
||||
to fall through to the next resolution layer (built-in dispatch or
|
||||
Edge TTS default).
|
||||
|
||||
Resolution invariants enforced here (matches issue #30398):
|
||||
|
||||
1. Built-in provider names short-circuit — never reach the plugin
|
||||
registry. The caller is responsible for the elif chain that
|
||||
handles ``edge``/``openai``/etc.; this function explicitly
|
||||
rejects those names defensively.
|
||||
2. Command-type providers declared under
|
||||
``tts.providers.<name>: type: command`` (PR #17843) win over a
|
||||
plugin with the same name. The caller passes us only when its
|
||||
own command-provider check returned None — we re-verify here so
|
||||
a refactor of the caller can't silently break the invariant.
|
||||
3. Plugin dispatch fires only when ``provider`` matches a registered
|
||||
:class:`TTSProvider` whose ``name`` equals the configured value.
|
||||
Unknown names return None (caller falls through to Edge default).
|
||||
|
||||
Plugin exceptions are caught and re-raised — the outer
|
||||
``text_to_speech_tool`` try/except converts them to the standard
|
||||
error envelope, matching how command-provider failures surface.
|
||||
"""
|
||||
if not provider:
|
||||
return None
|
||||
key = provider.lower().strip()
|
||||
if key in BUILTIN_TTS_PROVIDERS:
|
||||
return None
|
||||
# Defense in depth: command-provider check should already have
|
||||
# short-circuited the caller. If a same-name command config exists,
|
||||
# bail so the command path wins.
|
||||
if _is_command_provider_config(_get_named_provider_config(tts_config, key)):
|
||||
return None
|
||||
try:
|
||||
from agent.tts_registry import get_provider
|
||||
from hermes_cli.plugins import _ensure_plugins_discovered
|
||||
|
||||
_ensure_plugins_discovered()
|
||||
plugin_provider = get_provider(key)
|
||||
if plugin_provider is None:
|
||||
# Long-lived sessions may have discovered plugins before the
|
||||
# bundled backend was patched in or before config changed.
|
||||
# Retry once with a forced refresh before surfacing fall-
|
||||
# through. Mirrors the image_gen / browser dispatcher
|
||||
# recovery pattern.
|
||||
_ensure_plugins_discovered(force=True)
|
||||
plugin_provider = get_provider(key)
|
||||
except Exception as exc: # noqa: BLE001 — discovery failure is non-fatal
|
||||
logger.debug("tts plugin dispatch skipped (discovery failed): %s", exc)
|
||||
return None
|
||||
if plugin_provider is None:
|
||||
return None
|
||||
|
||||
# Resolve voice / model / format from tts_config — providers should
|
||||
# treat all of these as optional and fall back to their own defaults
|
||||
# when None is passed (matches the ABC contract documented on
|
||||
# ``TTSProvider.synthesize``).
|
||||
voice = tts_config.get("voice") if isinstance(tts_config, dict) else None
|
||||
model = tts_config.get("model") if isinstance(tts_config, dict) else None
|
||||
speed = tts_config.get("speed") if isinstance(tts_config, dict) else None
|
||||
fmt = (
|
||||
tts_config.get("output_format", DEFAULT_COMMAND_TTS_OUTPUT_FORMAT)
|
||||
if isinstance(tts_config, dict)
|
||||
else DEFAULT_COMMAND_TTS_OUTPUT_FORMAT
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"Generating speech with plugin TTS provider '%s'...", key,
|
||||
)
|
||||
written = plugin_provider.synthesize(
|
||||
text,
|
||||
output_path,
|
||||
voice=voice if isinstance(voice, str) and voice else None,
|
||||
model=model if isinstance(model, str) and model else None,
|
||||
speed=float(speed) if isinstance(speed, (int, float)) else None,
|
||||
format=str(fmt).lower() if fmt else "mp3",
|
||||
)
|
||||
# Provider contract: returns the (possibly rewritten) output path.
|
||||
# Defensive against a provider returning None or a non-string —
|
||||
# fall back to the caller's expected output_path.
|
||||
return written if isinstance(written, str) and written else output_path
|
||||
|
||||
|
||||
def _plugin_provider_is_voice_compatible(provider: str) -> bool:
|
||||
"""Return True when the registered plugin provider opts into voice
|
||||
bubble delivery via its ``voice_compatible`` property.
|
||||
|
||||
Defensive: any registry or property access failure means False
|
||||
(matches the safe default for the command-provider path).
|
||||
"""
|
||||
if not provider:
|
||||
return False
|
||||
key = provider.lower().strip()
|
||||
if key in BUILTIN_TTS_PROVIDERS:
|
||||
return False
|
||||
try:
|
||||
from agent.tts_registry import get_provider
|
||||
|
||||
plugin_provider = get_provider(key)
|
||||
if plugin_provider is None:
|
||||
return False
|
||||
return bool(plugin_provider.voice_compatible)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.debug(
|
||||
"tts plugin voice_compatible check failed for '%s': %s", key, exc,
|
||||
)
|
||||
return False
|
||||
|
||||
|
||||
def _iter_command_providers(tts_config: Dict[str, Any]):
|
||||
"""Yield (name, config) pairs for every declared command-type provider."""
|
||||
if not isinstance(tts_config, dict):
|
||||
|
|
@ -1787,6 +1904,21 @@ def text_to_speech_tool(
|
|||
text, file_str, provider, command_provider_config, tts_config,
|
||||
)
|
||||
|
||||
# Plugin-registered TTS backend (issue #30398). Fires when the
|
||||
# configured provider is neither a built-in nor a command-type
|
||||
# entry, AND a plugin is registered under that name. The walrus
|
||||
# binds `_plugin_path` only when the dispatcher returns a path
|
||||
# (i.e. a plugin was actually found); a None return falls
|
||||
# through to the built-in elif chain so unknown names hit the
|
||||
# Edge TTS default at the bottom. The dispatcher itself enforces
|
||||
# built-ins-always-win + command-wins-over-plugin defensively.
|
||||
elif provider not in BUILTIN_TTS_PROVIDERS and (
|
||||
_plugin_path := _dispatch_to_plugin_provider(
|
||||
text, file_str, provider, tts_config,
|
||||
)
|
||||
) is not None:
|
||||
file_str = _plugin_path
|
||||
|
||||
elif provider == "elevenlabs":
|
||||
try:
|
||||
_import_elevenlabs()
|
||||
|
|
@ -1925,6 +2057,18 @@ def text_to_speech_tool(
|
|||
if opus_path:
|
||||
file_str = opus_path
|
||||
voice_compatible = file_str.endswith(".ogg")
|
||||
elif provider not in BUILTIN_TTS_PROVIDERS:
|
||||
# Plugin-registered provider (issue #30398). Voice-bubble
|
||||
# delivery opts in via ``TTSProvider.voice_compatible``
|
||||
# (mirrors the command-provider opt-in). Plugins that
|
||||
# already write Opus skip the ffmpeg conversion.
|
||||
plugin_voice_compatible = _plugin_provider_is_voice_compatible(provider)
|
||||
if plugin_voice_compatible:
|
||||
if not file_str.endswith(".ogg"):
|
||||
opus_path = _convert_to_opus(file_str)
|
||||
if opus_path:
|
||||
file_str = opus_path
|
||||
voice_compatible = file_str.endswith(".ogg")
|
||||
elif (
|
||||
want_opus
|
||||
and provider in {"edge", "neutts", "minimax", "xai", "kittentts", "piper"}
|
||||
|
|
|
|||
|
|
@ -234,7 +234,7 @@ The table above shows the four plugin categories, but within "General plugins" t
|
|||
| A **context-compression strategy** | Context-engine plugin — `ctx.register_context_engine()` | [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) |
|
||||
| An **image-generation backend** (DALL·E, SDXL, …) | Backend plugin — `ctx.register_image_gen_provider()` | [Image Generation Provider Plugins](/docs/developer-guide/image-gen-provider-plugin) |
|
||||
| A **video-generation backend** (Veo, Kling, Pixverse, Grok-Imagine, Runway, …) | Backend plugin — `ctx.register_video_gen_provider()` | [Video Generation Provider Plugins](/docs/developer-guide/video-gen-provider-plugin) |
|
||||
| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven — declare under `tts.providers.<name>` with `type: command` in `config.yaml` | [TTS setup](/docs/user-guide/features/tts#custom-command-providers) |
|
||||
| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven (recommended) — declare under `tts.providers.<name>` with `type: command` in `config.yaml`. OR Python backend plugin — `ctx.register_tts_provider()` for Python-SDK / streaming engines that need more than a shell template. | [TTS Setup](/docs/user-guide/features/tts#custom-command-providers) · [Python plugin guide](/docs/user-guide/features/tts#python-plugin-providers) |
|
||||
| An **STT backend** (custom whisper binary, local ASR CLI) | Config-driven — set `HERMES_LOCAL_STT_COMMAND` env var to a shell template | [Voice Message Transcription (STT)](/docs/user-guide/features/tts#voice-message-transcription-stt) |
|
||||
| **External tools via MCP** (filesystem, GitHub, Linear, Notion, any MCP server) | Config-driven — declare `mcp_servers.<name>` with `command:` / `url:` in `config.yaml`. Hermes auto-discovers the server's tools and registers them alongside built-ins. | [MCP](/docs/user-guide/features/mcp) |
|
||||
| **Additional skill sources** (custom GitHub repos, private skill indexes) | CLI — `hermes skills tap add <repo>` | [Skills Hub](/docs/user-guide/features/skills#skills-hub) · [Publishing a custom tap](/docs/user-guide/features/skills#publishing-a-custom-skill-tap) |
|
||||
|
|
|
|||
|
|
@ -297,6 +297,85 @@ Use `{{` and `}}` for literal braces.
|
|||
|
||||
Command-type providers run whatever shell command you configure, with your user's permissions. Hermes quotes placeholder values and enforces the configured timeout, but the command template itself is trusted local input — treat it the same way you would a shell script on your PATH.
|
||||
|
||||
### Python plugin providers
|
||||
|
||||
For TTS engines that can't be expressed as a single shell command — Python SDKs without a CLI, streaming engines, voice-listing APIs, OAuth-refreshing auth — register a Python plugin via `ctx.register_tts_provider()`. The plugin **coexists with** (does not replace) the [Custom command providers](#custom-command-providers) registry; pick the surface that fits your engine.
|
||||
|
||||
#### When to pick which
|
||||
|
||||
| Your backend has… | Use |
|
||||
|---|---|
|
||||
| A single CLI reading text from a file/stdin and writing audio to a file/stdout | **Command provider** (no Python needed) |
|
||||
| Two or three CLIs chained with shell pipes | **Command provider** |
|
||||
| A Python SDK only — no CLI | **Plugin** |
|
||||
| Streaming bytes you want to deliver chunked (mid-generation voice bubbles) | **Plugin** (override `stream()`) |
|
||||
| A voice-listing API used by `hermes setup` | **Plugin** (override `list_voices()`) |
|
||||
| OAuth refresh flow (not a static bearer token) | **Plugin** |
|
||||
|
||||
Built-ins always win, and command providers win over a same-name plugin — so plugins are safe to register against any non-built-in name without worrying about shadowing your existing config.
|
||||
|
||||
#### Minimal plugin
|
||||
|
||||
Drop this in `~/.hermes/plugins/my-tts/`:
|
||||
|
||||
`plugin.yaml`:
|
||||
```yaml
|
||||
name: my-tts
|
||||
version: 0.1.0
|
||||
description: "My custom Python TTS backend"
|
||||
```
|
||||
|
||||
`__init__.py`:
|
||||
```python
|
||||
from agent.tts_provider import TTSProvider
|
||||
|
||||
|
||||
class MyTTSProvider(TTSProvider):
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return "my-tts" # what tts.provider matches against
|
||||
|
||||
@property
|
||||
def display_name(self) -> str:
|
||||
return "My Custom TTS"
|
||||
|
||||
def is_available(self) -> bool:
|
||||
# Return False when credentials/deps are missing — picker skips
|
||||
# this row but the dispatcher still routes here on explicit config.
|
||||
import os
|
||||
return bool(os.environ.get("MY_TTS_API_KEY"))
|
||||
|
||||
def synthesize(self, text, output_path, *, voice=None, model=None,
|
||||
speed=None, format="mp3", **extra) -> str:
|
||||
# Write audio bytes to output_path, return the path.
|
||||
# Raise on failure — the dispatcher converts exceptions to a
|
||||
# standard error envelope.
|
||||
import my_tts_sdk
|
||||
client = my_tts_sdk.Client()
|
||||
audio_bytes = client.synthesize(text=text, voice=voice or "default")
|
||||
with open(output_path, "wb") as f:
|
||||
f.write(audio_bytes)
|
||||
return output_path
|
||||
|
||||
|
||||
def register(ctx):
|
||||
ctx.register_tts_provider(MyTTSProvider())
|
||||
```
|
||||
|
||||
Enable it (`hermes plugins enable my-tts`), point `tts.provider` at it (`tts.provider: my-tts` in `config.yaml`), and the `text_to_speech` tool will route through your plugin.
|
||||
|
||||
#### Optional hooks
|
||||
|
||||
Override these on your provider class for richer integration:
|
||||
|
||||
- `list_voices()` → list of `{id, display, language, gender, preview_url}` dicts shown in `hermes tools`.
|
||||
- `list_models()` → list of `{id, display, languages, max_text_length}` dicts.
|
||||
- `get_setup_schema()` → return `{name, badge, tag, env_vars: [{key, prompt, url}]}` to power the picker row in `hermes tools` / `hermes setup`. Without this, the plugin still works but its row in the picker is minimal.
|
||||
- `stream(text, *, voice, model, format, **extra)` → iterator yielding audio bytes for streaming delivery (default raises `NotImplementedError`).
|
||||
- `voice_compatible` property → set `True` if your output is Opus-compatible and the gateway should deliver it as a voice bubble (default `False` = regular audio attachment).
|
||||
|
||||
See `agent/tts_provider.py` for the full ABC including docstrings.
|
||||
|
||||
## Voice Message Transcription (STT)
|
||||
|
||||
Voice messages sent on Telegram, Discord, WhatsApp, Slack, or Signal are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue