feat(stt): add register_transcription_provider() plugin hook

Add an opt-in Python plugin surface for speech-to-text backends, mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio, Gemini-STT, custom proprietary engines) can be implemented as plugins without modifying tools/transcription_tools.py. Built-ins always win -------------------- The 6 built-in STT providers (local/faster-whisper, local_command, groq, openai, mistral, xai) keep their native handlers. Plugins attempting to register under a built-in name are rejected at registration time with a warning and re-checked defensively at dispatch. Resolution order ---------------- 1. stt.provider matches a built-in → built-in dispatch (unchanged) 2. stt.provider matches a registered plugin → a. if plugin.is_available() returns False → unavailability envelope identifying the plugin (not the generic "No STT provider" message — the user explicitly opted into this plugin) b. otherwise plugin.transcribe() with model + language forwarded from stt.<provider>.{model,language} config 3. No match → legacy "No STT provider available" error (unchanged) Per-provider config namespace ----------------------------- Plugins read their config from stt.<provider> in config.yaml, mirroring how built-ins read stt.openai.model / stt.mistral.model. The dispatcher forwards `model` and `language` from this section. Caller's explicit `model=` argument overrides the config-set model. Files ----- - agent/transcription_provider.py: TranscriptionProvider ABC - agent/transcription_registry.py: register/get/list providers, built-in shadow guard, _reset_for_tests - hermes_cli/plugins.py: register_transcription_provider() on PluginContext - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset, _dispatch_to_plugin_provider() with availability gate, wire-in after xai branch and before "No STT provider" error - tests/agent/test_transcription_registry.py: 27 tests - tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests - tests/tools/test_transcription_plugin_dispatch.py: 28 tests (covering built-in short-circuit, plugin dispatch, exception envelope, non-dict guard, availability gate, language forwarding) - tests/plugins/transcription/check_parity_vs_main.py: 10-scenario subprocess-pinned parity harness vs origin/main - website/docs/user-guide/features/{tts,plugins}.md: docs Behavior parity --------------- 10 scenarios, 8 OK + 2 expected DIFFs: no_provider_error → plugin (plugin-installed scenario) no_provider_error → plugin_unavailable (plugin-installed-unavailable scenario; PR returns cleaner envelope) Zero behavior change for users not opting into a plugin. Issue follow-up to #30398.
2026-06-08 08:11:38 +00:00 · 2026-05-22 21:03:42 +05:30 · 2026-05-22 21:03:42 +05:30 · 2cd952e110
commit 2cd952e110
parent 2e0ac31a72
11 changed files with 1831 additions and 1 deletions
--- a/website/docs/user-guide/features/plugins.md
+++ b/website/docs/user-guide/features/plugins.md
@ -235,7 +235,7 @@ The table above shows the four plugin categories, but within "General plugins" t
 | An **image-generation backend** (DALL·E, SDXL, …) | Backend plugin — `ctx.register_image_gen_provider()` | [Image Generation Provider Plugins](/developer-guide/image-gen-provider-plugin) |
 | A **video-generation backend** (Veo, Kling, Pixverse, Grok-Imagine, Runway, …) | Backend plugin — `ctx.register_video_gen_provider()` | [Video Generation Provider Plugins](/developer-guide/video-gen-provider-plugin) |
 | A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven (recommended) — declare under `tts.providers.<name>` with `type: command` in `config.yaml`. OR Python backend plugin — `ctx.register_tts_provider()` for Python-SDK / streaming engines that need more than a shell template. | [TTS Setup](/user-guide/features/tts#custom-command-providers) · [Python plugin guide](/user-guide/features/tts#python-plugin-providers) |
-| An **STT backend** (custom whisper binary, local ASR CLI) | Config-driven — set `HERMES_LOCAL_STT_COMMAND` env var to a shell template | [Voice Message Transcription (STT)](/user-guide/features/tts#voice-message-transcription-stt) |
+| An **STT backend** (any CLI — whisper.cpp, custom whisper binary, local ASR CLI) | Config-driven (recommended) — declare under `stt.providers.<name>` with `type: command` in `config.yaml`, or set `HERMES_LOCAL_STT_COMMAND` for the legacy single-command escape hatch. OR Python backend plugin — `ctx.register_transcription_provider()` for Python-SDK engines (OpenRouter, SenseAudio, Gemini-STT, etc.). | [STT Setup](/user-guide/features/tts#stt-custom-command-providers) · [Python plugin guide](/user-guide/features/tts#python-plugin-providers-stt) |
 | **External tools via MCP** (filesystem, GitHub, Linear, Notion, any MCP server) | Config-driven — declare `mcp_servers.<name>` with `command:` / `url:` in `config.yaml`. Hermes auto-discovers the server's tools and registers them alongside built-ins. | [MCP](/user-guide/features/mcp) |
 | **Additional skill sources** (custom GitHub repos, private skill indexes) | CLI — `hermes skills tap add <repo>` | [Skills Hub](/user-guide/features/skills#skills-hub) · [Publishing a custom tap](/user-guide/features/skills#publishing-a-custom-skill-tap) |
 | **Gateway event hooks** (fire on `gateway:startup`, `session:start`, `agent:end`, `command:*`) | Drop `HOOK.yaml` + `handler.py` into `~/.hermes/hooks/<name>/` | [Event Hooks](/user-guide/features/hooks#gateway-event-hooks) |