mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
feat(tools): add Voxtral Transcribe STT provider (Mistral AI)
This commit is contained in:
parent
5d2fc6d928
commit
5f4b93c20f
7 changed files with 331 additions and 34 deletions
|
|
@ -102,11 +102,13 @@ Local transcription works out of the box when `faster-whisper` is installed. If
|
|||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
stt:
|
||||
provider: "local" # "local" | "groq" | "openai"
|
||||
provider: "local" # "local" | "groq" | "openai" | "mistral"
|
||||
local:
|
||||
model: "base" # tiny, base, small, medium, large-v3
|
||||
openai:
|
||||
model: "whisper-1" # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
|
||||
mistral:
|
||||
model: "voxtral-mini-latest" # voxtral-mini-latest, voxtral-mini-2602
|
||||
```
|
||||
|
||||
### Provider Details
|
||||
|
|
@ -125,6 +127,8 @@ stt:
|
|||
|
||||
**OpenAI API** — Accepts `VOICE_TOOLS_OPENAI_KEY` first and falls back to `OPENAI_API_KEY`. Supports `whisper-1`, `gpt-4o-mini-transcribe`, and `gpt-4o-transcribe`.
|
||||
|
||||
**Mistral API (Voxtral Transcribe)** — Requires `MISTRAL_API_KEY`. Uses Mistral's [Voxtral Transcribe](https://docs.mistral.ai/capabilities/audio/speech_to_text/) models. Supports 13 languages, speaker diarization, and word-level timestamps. Install with `pip install hermes-agent[mistral]`.
|
||||
|
||||
**Custom local CLI fallback** — Set `HERMES_LOCAL_STT_COMMAND` if you want Hermes to call a local transcription command directly. The command template supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders.
|
||||
|
||||
### Fallback Behavior
|
||||
|
|
@ -133,4 +137,5 @@ If your configured provider isn't available, Hermes automatically falls back:
|
|||
- **Local faster-whisper unavailable** → Tries a local `whisper` CLI or `HERMES_LOCAL_STT_COMMAND` before cloud providers
|
||||
- **Groq key not set** → Falls back to local transcription, then OpenAI
|
||||
- **OpenAI key not set** → Falls back to local transcription, then Groq
|
||||
- **Mistral key/SDK not set** → Skipped in auto-detect; falls through to next available provider
|
||||
- **Nothing available** → Voice messages pass through with an accurate note to the user
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue