feat(tools): add Voxtral TTS provider (Mistral AI)

2026-04-26 01:01:40 +00:00 · 2026-04-06 19:04:00 +01:00 · 2026-04-06 19:04:00 +01:00 · 640441b865
commit 640441b865
parent 5a55d54ee2
11 changed files with 379 additions and 12 deletions
--- a/website/docs/user-guide/features/tts.md
+++ b/website/docs/user-guide/features/tts.md
@ -10,7 +10,7 @@ Hermes Agent supports both text-to-speech output and voice message transcription

 ## Text-to-Speech

-Convert text to speech with five providers:
+Convert text to speech with six providers:

 | Provider | Quality | Cost | API Key |
 |----------|---------|------|---------|
@ -18,6 +18,7 @@ Convert text to speech with five providers:
 | **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
 | **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
 | **MiniMax TTS** | Excellent | Paid | `MINIMAX_API_KEY` |
+| **Mistral (Voxtral TTS)** | Excellent | Paid | `MISTRAL_API_KEY` |
 | **NeuTTS** | Good | Free | None needed |

 ### Platform Delivery
@ -34,7 +35,7 @@ Convert text to speech with five providers:
 ```yaml
 # In ~/.hermes/config.yaml
 tts:
-  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "minimax" | "neutts"
+  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "neutts"
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
  elevenlabs:
@ -50,6 +51,9 @@ tts:
    speed: 1                    # 0.5 - 2.0
    vol: 1                      # 0 - 10
    pitch: 0                    # -12 - 12
+  mistral:
+    model: "voxtral-mini-tts-2603"
+    voice_id: "c69964a6-ab8b-4f8a-9465-ec0925096ec8"  # Paul - Neutral (default)
  neutts:
    ref_audio: ''
    ref_text: ''
@ -61,7 +65,7 @@ tts:

 Telegram voice bubbles require Opus/OGG audio format:

- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
+- **OpenAI, ElevenLabs, and Mistral** produce Opus natively — no extra setup
 - **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
 - **MiniMax TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
 - **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
@ -80,7 +84,7 @@ sudo dnf install ffmpeg
 Without ffmpeg, Edge TTS, MiniMax TTS, and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).

 :::tip
-If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.
+If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider.
 :::

 ## Voice Message Transcription (STT)