mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-08 08:11:38 +00:00
feat(stt): add register_transcription_provider() plugin hook
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
This commit is contained in:
parent
2e0ac31a72
commit
2cd952e110
11 changed files with 1831 additions and 1 deletions
|
|
@ -235,7 +235,7 @@ The table above shows the four plugin categories, but within "General plugins" t
|
|||
| An **image-generation backend** (DALL·E, SDXL, …) | Backend plugin — `ctx.register_image_gen_provider()` | [Image Generation Provider Plugins](/developer-guide/image-gen-provider-plugin) |
|
||||
| A **video-generation backend** (Veo, Kling, Pixverse, Grok-Imagine, Runway, …) | Backend plugin — `ctx.register_video_gen_provider()` | [Video Generation Provider Plugins](/developer-guide/video-gen-provider-plugin) |
|
||||
| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven (recommended) — declare under `tts.providers.<name>` with `type: command` in `config.yaml`. OR Python backend plugin — `ctx.register_tts_provider()` for Python-SDK / streaming engines that need more than a shell template. | [TTS Setup](/user-guide/features/tts#custom-command-providers) · [Python plugin guide](/user-guide/features/tts#python-plugin-providers) |
|
||||
| An **STT backend** (custom whisper binary, local ASR CLI) | Config-driven — set `HERMES_LOCAL_STT_COMMAND` env var to a shell template | [Voice Message Transcription (STT)](/user-guide/features/tts#voice-message-transcription-stt) |
|
||||
| An **STT backend** (any CLI — whisper.cpp, custom whisper binary, local ASR CLI) | Config-driven (recommended) — declare under `stt.providers.<name>` with `type: command` in `config.yaml`, or set `HERMES_LOCAL_STT_COMMAND` for the legacy single-command escape hatch. OR Python backend plugin — `ctx.register_transcription_provider()` for Python-SDK engines (OpenRouter, SenseAudio, Gemini-STT, etc.). | [STT Setup](/user-guide/features/tts#stt-custom-command-providers) · [Python plugin guide](/user-guide/features/tts#python-plugin-providers-stt) |
|
||||
| **External tools via MCP** (filesystem, GitHub, Linear, Notion, any MCP server) | Config-driven — declare `mcp_servers.<name>` with `command:` / `url:` in `config.yaml`. Hermes auto-discovers the server's tools and registers them alongside built-ins. | [MCP](/user-guide/features/mcp) |
|
||||
| **Additional skill sources** (custom GitHub repos, private skill indexes) | CLI — `hermes skills tap add <repo>` | [Skills Hub](/user-guide/features/skills#skills-hub) · [Publishing a custom tap](/user-guide/features/skills#publishing-a-custom-skill-tap) |
|
||||
| **Gateway event hooks** (fire on `gateway:startup`, `session:start`, `agent:end`, `command:*`) | Drop `HOOK.yaml` + `handler.py` into `~/.hermes/hooks/<name>/` | [Event Hooks](/user-guide/features/hooks#gateway-event-hooks) |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue