mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
193 lines
6.7 KiB
Python
193 lines
6.7 KiB
Python
"""
|
|
Transcription Provider ABC
|
|
==========================
|
|
|
|
Defines the pluggable-backend interface for speech-to-text. Providers
|
|
register instances via
|
|
:meth:`PluginContext.register_transcription_provider`; the active one
|
|
(selected via ``stt.provider`` in ``config.yaml``) services every
|
|
:func:`tools.transcription_tools.transcribe_audio` call **when the
|
|
configured name is neither a built-in (``local``, ``local_command``,
|
|
``groq``, ``openai``, ``mistral``, ``xai``) nor disabled**.
|
|
|
|
Two coexisting STT extension surfaces — in resolution order:
|
|
|
|
1. **Built-in providers** (``BUILTIN_STT_PROVIDERS`` in
|
|
:mod:`tools.transcription_tools`) — native Python implementations
|
|
for the 6 backends shipped today (faster-whisper, local_command,
|
|
Groq, OpenAI, Mistral, xAI). **Always win** — plugins cannot
|
|
shadow them. The single-env-var shell escape hatch
|
|
``HERMES_LOCAL_STT_COMMAND`` is preserved via the built-in
|
|
``local_command`` path.
|
|
2. **Plugin-registered providers** (this ABC). For new STT backends —
|
|
OpenRouter, SenseAudio, Gemini-STT, custom proprietary engines —
|
|
that need a Python implementation without modifying
|
|
``tools/transcription_tools.py``.
|
|
|
|
Built-ins-always-win is enforced at registration time
|
|
(:func:`agent.transcription_registry.register_provider` rejects names
|
|
in ``BUILTIN_STT_PROVIDERS`` with a warning) AND at dispatch time
|
|
(:func:`tools.transcription_tools._dispatch_to_plugin_provider`
|
|
re-checks defensively).
|
|
|
|
Providers live in ``<repo>/plugins/transcription/<name>/`` (built-in
|
|
plugins, none shipped today) or
|
|
``~/.hermes/plugins/transcription/<name>/`` (user-installed).
|
|
|
|
Response contract
|
|
-----------------
|
|
:meth:`TranscriptionProvider.transcribe` returns a dict with keys::
|
|
|
|
success bool
|
|
transcript str transcribed text (empty when success=False)
|
|
provider str provider name (for diagnostics)
|
|
error str only when success=False
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import abc
|
|
import logging
|
|
from typing import Any, Dict, List, Optional
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# ABC
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TranscriptionProvider(abc.ABC):
|
|
"""Abstract base class for a speech-to-text backend.
|
|
|
|
Subclasses must implement :attr:`name` and :meth:`transcribe`.
|
|
Everything else has sane defaults — override only what your provider
|
|
needs.
|
|
"""
|
|
|
|
@property
|
|
@abc.abstractmethod
|
|
def name(self) -> str:
|
|
"""Stable short identifier used in ``stt.provider`` config.
|
|
|
|
Lowercase, no spaces. Examples: ``openrouter``, ``sensaudio``,
|
|
``gemini``, ``deepgram``. Names that collide with a built-in STT
|
|
provider (``local``, ``local_command``, ``groq``, ``openai``,
|
|
``mistral``, ``xai``) are rejected at registration time.
|
|
"""
|
|
|
|
@property
|
|
def display_name(self) -> str:
|
|
"""Human-readable label shown in ``hermes tools``.
|
|
|
|
Defaults to ``name.title()``.
|
|
"""
|
|
return self.name.title()
|
|
|
|
def is_available(self) -> bool:
|
|
"""Return True when this provider can service calls.
|
|
|
|
Typically checks for a required API key + that the SDK is
|
|
importable. Default: True (providers with no external
|
|
dependencies are always available).
|
|
|
|
Must NOT raise — used by the picker and ``hermes setup`` for
|
|
availability displays and should fail gracefully.
|
|
"""
|
|
return True
|
|
|
|
def list_models(self) -> List[Dict[str, Any]]:
|
|
"""Return model catalog entries.
|
|
|
|
Each entry::
|
|
|
|
{
|
|
"id": "whisper-large-v3-turbo", # required
|
|
"display": "Whisper Large v3 Turbo", # optional
|
|
"languages": ["en", "es", "fr"], # optional
|
|
"max_audio_seconds": 1500, # optional
|
|
}
|
|
|
|
Default: empty list (provider has a single fixed model or
|
|
doesn't expose model selection).
|
|
"""
|
|
return []
|
|
|
|
def default_model(self) -> Optional[str]:
|
|
"""Return the default model id, or None if not applicable."""
|
|
models = self.list_models()
|
|
if models:
|
|
return models[0].get("id")
|
|
return None
|
|
|
|
def get_setup_schema(self) -> Dict[str, Any]:
|
|
"""Return provider metadata for the ``hermes tools`` picker.
|
|
|
|
Used by ``tools_config.py`` to inject this provider as a row in
|
|
the Speech-to-Text provider list. Shape::
|
|
|
|
{
|
|
"name": "OpenRouter STT", # picker label
|
|
"badge": "paid", # optional short tag
|
|
"tag": "Whisper via OpenRouter API", # optional subtitle
|
|
"env_vars": [ # keys to prompt for
|
|
{"key": "OPENROUTER_API_KEY",
|
|
"prompt": "OpenRouter API key",
|
|
"url": "https://openrouter.ai/keys"},
|
|
],
|
|
}
|
|
|
|
Default: minimal entry derived from ``display_name`` with no
|
|
env vars. Override to expose API key prompts and custom badges.
|
|
"""
|
|
return {
|
|
"name": self.display_name,
|
|
"badge": "",
|
|
"tag": "",
|
|
"env_vars": [],
|
|
}
|
|
|
|
@abc.abstractmethod
|
|
def transcribe(
|
|
self,
|
|
file_path: str,
|
|
*,
|
|
model: Optional[str] = None,
|
|
language: Optional[str] = None,
|
|
**extra: Any,
|
|
) -> Dict[str, Any]:
|
|
"""Transcribe the audio file at ``file_path``.
|
|
|
|
Returns a dict with the standard envelope::
|
|
|
|
{
|
|
"success": True,
|
|
"transcript": "the transcribed text",
|
|
"provider": "<this provider's name>",
|
|
}
|
|
|
|
or on failure::
|
|
|
|
{
|
|
"success": False,
|
|
"transcript": "",
|
|
"error": "human-readable error message",
|
|
"provider": "<this provider's name>",
|
|
}
|
|
|
|
Implementations should NOT raise — convert exceptions to the
|
|
error envelope so the dispatcher can deliver a consistent shape
|
|
to the gateway/CLI caller.
|
|
|
|
Args:
|
|
file_path: Absolute path to the audio file. The dispatcher
|
|
has already validated existence + size before calling.
|
|
model: Model identifier from :meth:`list_models`, or None
|
|
to use :meth:`default_model`.
|
|
language: Optional BCP-47 language hint (e.g. ``"en"``,
|
|
``"ja"``) — providers without language hints should
|
|
ignore this argument.
|
|
**extra: Forward-compat parameters future schema versions
|
|
may expose. Implementations should ignore unknown keys.
|
|
"""
|