mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
feat(tools): add Voxtral TTS provider (Mistral AI)
This commit is contained in:
parent
5a55d54ee2
commit
640441b865
11 changed files with 379 additions and 12 deletions
|
|
@ -10,7 +10,7 @@ Hermes Agent supports both text-to-speech output and voice message transcription
|
|||
|
||||
## Text-to-Speech
|
||||
|
||||
Convert text to speech with five providers:
|
||||
Convert text to speech with six providers:
|
||||
|
||||
| Provider | Quality | Cost | API Key |
|
||||
|----------|---------|------|---------|
|
||||
|
|
@ -18,6 +18,7 @@ Convert text to speech with five providers:
|
|||
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
|
||||
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
|
||||
| **MiniMax TTS** | Excellent | Paid | `MINIMAX_API_KEY` |
|
||||
| **Mistral (Voxtral TTS)** | Excellent | Paid | `MISTRAL_API_KEY` |
|
||||
| **NeuTTS** | Good | Free | None needed |
|
||||
|
||||
### Platform Delivery
|
||||
|
|
@ -34,7 +35,7 @@ Convert text to speech with five providers:
|
|||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
tts:
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "neutts"
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "neutts"
|
||||
edge:
|
||||
voice: "en-US-AriaNeural" # 322 voices, 74 languages
|
||||
elevenlabs:
|
||||
|
|
@ -50,6 +51,9 @@ tts:
|
|||
speed: 1 # 0.5 - 2.0
|
||||
vol: 1 # 0 - 10
|
||||
pitch: 0 # -12 - 12
|
||||
mistral:
|
||||
model: "voxtral-mini-tts-2603"
|
||||
voice_id: "c69964a6-ab8b-4f8a-9465-ec0925096ec8" # Paul - Neutral (default)
|
||||
neutts:
|
||||
ref_audio: ''
|
||||
ref_text: ''
|
||||
|
|
@ -61,7 +65,7 @@ tts:
|
|||
|
||||
Telegram voice bubbles require Opus/OGG audio format:
|
||||
|
||||
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
|
||||
- **OpenAI, ElevenLabs, and Mistral** produce Opus natively — no extra setup
|
||||
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
|
||||
- **MiniMax TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
|
|
@ -80,7 +84,7 @@ sudo dnf install ffmpeg
|
|||
Without ffmpeg, Edge TTS, MiniMax TTS, and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
|
||||
|
||||
:::tip
|
||||
If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.
|
||||
If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider.
|
||||
:::
|
||||
|
||||
## Voice Message Transcription (STT)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue