Merge origin/main, resolve conflicts (self._base_url_lower)

This commit is contained in:
Test 2026-03-18 04:09:00 -07:00
commit e7844e9c8d
54 changed files with 2281 additions and 179 deletions

View file

@ -186,11 +186,11 @@ hermes chat --provider kimi-coding --model moonshot-v1-auto
# Requires: KIMI_API_KEY in ~/.hermes/.env
# MiniMax (global endpoint)
hermes chat --provider minimax --model MiniMax-Text-01
hermes chat --provider minimax --model MiniMax-M2.7
# Requires: MINIMAX_API_KEY in ~/.hermes/.env
# MiniMax (China endpoint)
hermes chat --provider minimax-cn --model MiniMax-Text-01
hermes chat --provider minimax-cn --model MiniMax-M2.7
# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env
# Alibaba Cloud / DashScope (Qwen models)
@ -984,7 +984,7 @@ You can also change the reasoning effort at runtime with the `/reasoning` comman
```yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai"
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
@ -993,6 +993,11 @@ tts:
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
```
This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).
@ -1140,6 +1145,21 @@ group_sessions_per_user: true # true = per-user isolation in groups/channels, f
For the behavior details and examples, see [Sessions](/docs/user-guide/sessions) and the [Discord guide](/docs/user-guide/messaging/discord).
## Unauthorized DM Behavior
Control what Hermes does when an unknown user sends a direct message:
```yaml
unauthorized_dm_behavior: pair
whatsapp:
unauthorized_dm_behavior: ignore
```
- `pair` is the default. Hermes denies access, but replies with a one-time pairing code in DMs.
- `ignore` silently drops unauthorized DMs.
- Platform sections override the global default, so you can keep pairing enabled broadly while making one platform quieter.
## Quick Commands
Define custom commands that run shell commands without invoking the LLM — zero token usage, instant execution. Especially useful from messaging platforms (Telegram, Discord, etc.) for quick server checks or utility scripts.

View file

@ -20,10 +20,11 @@ If you have ever wanted Hermes to use a tool that already exists somewhere else,
## Quick start
1. Install MCP support:
1. Install MCP support (already included if you used the standard install script):
```bash
pip install hermes-agent[mcp]
cd ~/.hermes/hermes-agent
uv pip install -e ".[mcp]"
```
2. Add an MCP server to `~/.hermes/config.yaml`:
@ -374,7 +375,9 @@ Inspect the project root and explain the directory layout.
Check:
```bash
pip install hermes-agent[mcp]
# Verify MCP deps are installed (already included in standard install)
cd ~/.hermes/hermes-agent && uv pip install -e ".[mcp]"
node --version
npx --version
```

View file

@ -147,7 +147,7 @@ Default configuration:
- Tests 3 models at different scales for robustness:
- `qwen/qwen3-8b` (small)
- `z-ai/glm-4.7-flash` (medium)
- `minimax/minimax-m2.5` (large)
- `minimax/minimax-m2.7` (large)
- Total: ~144 rollouts
This validates:

View file

@ -10,13 +10,14 @@ Hermes Agent supports both text-to-speech output and voice message transcription
## Text-to-Speech
Convert text to speech with three providers:
Convert text to speech with four providers:
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Edge TTS** (default) | Good | Free | None needed |
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
| **NeuTTS** | Good | Free | None needed |
### Platform Delivery
@ -32,7 +33,7 @@ Convert text to speech with three providers:
```yaml
# In ~/.hermes/config.yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai"
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
@ -41,6 +42,11 @@ tts:
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
```
### Telegram Voice Bubbles & ffmpeg
@ -49,6 +55,7 @@ Telegram voice bubbles require Opus/OGG audio format:
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
```bash
# Ubuntu/Debian
@ -61,7 +68,7 @@ brew install ffmpeg
sudo dnf install ffmpeg
```
Without ffmpeg, Edge TTS audio is sent as a regular audio file (playable, but shows as a rectangular player instead of a voice bubble).
Without ffmpeg, Edge TTS and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
:::tip
If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.

View file

@ -44,6 +44,9 @@ pip install hermes-agent[messaging]
# Premium TTS (ElevenLabs)
pip install hermes-agent[tts-premium]
# Local TTS (NeuTTS, optional)
python -m pip install -U neutts[all]
# Everything at once
pip install hermes-agent[all]
```
@ -54,6 +57,8 @@ pip install hermes-agent[all]
| `messaging` | `discord.py[voice]`, `python-telegram-bot`, `aiohttp` | Discord & Telegram bots |
| `tts-premium` | `elevenlabs` | ElevenLabs TTS provider |
Optional local TTS provider: install `neutts` separately with `python -m pip install -U neutts[all]`. On first use it downloads the model automatically.
:::info
`discord.py[voice]` installs **PyNaCl** (for voice encryption) and **opus bindings** automatically. This is required for Discord voice channel support.
:::
@ -63,9 +68,11 @@ pip install hermes-agent[all]
```bash
# macOS
brew install portaudio ffmpeg opus
brew install espeak-ng # for NeuTTS
# Ubuntu/Debian
sudo apt install portaudio19-dev ffmpeg libopus0
sudo apt install espeak-ng # for NeuTTS
```
| Dependency | Purpose | Required For |
@ -73,6 +80,7 @@ sudo apt install portaudio19-dev ffmpeg libopus0
| **PortAudio** | Microphone input and audio playback | CLI voice mode |
| **ffmpeg** | Audio format conversion (MP3 → Opus, PCM → WAV) | All platforms |
| **Opus** | Discord voice codec | Discord voice channels |
| **espeak-ng** | Phonemizer backend | Local NeuTTS provider |
### API Keys
@ -84,8 +92,9 @@ Add to `~/.hermes/.env`:
GROQ_API_KEY=your-key # Groq Whisper — fast, free tier (cloud)
VOICE_TOOLS_OPENAI_KEY=your-key # OpenAI Whisper — paid (cloud)
# Text-to-Speech (optional — Edge TTS works without any key)
ELEVENLABS_API_KEY=your-key # ElevenLabs — premium quality
# Text-to-Speech (optional — Edge TTS and NeuTTS work without any key)
ELEVENLABS_API_KEY=*** # ElevenLabs — premium quality
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
```
:::tip
@ -303,8 +312,9 @@ DISCORD_ALLOWED_USERS=your-user-id
# STT — local provider needs no key (pip install faster-whisper)
# GROQ_API_KEY=your-key # Alternative: cloud-based, fast, free tier
# TTS — optional, Edge TTS (free) is the default
# ELEVENLABS_API_KEY=your-key # Premium quality
# TTS — optional. Edge TTS and NeuTTS need no key.
# ELEVENLABS_API_KEY=*** # Premium quality
# VOICE_TOOLS_OPENAI_KEY=*** # OpenAI TTS / Whisper
```
### Start the Gateway
@ -385,7 +395,7 @@ stt:
# Text-to-Speech
tts:
provider: "edge" # "edge" (free) | "elevenlabs" | "openai"
provider: "edge" # "edge" (free) | "elevenlabs" | "openai" | "neutts"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
@ -394,6 +404,11 @@ tts:
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
```
### Environment Variables
@ -410,9 +425,9 @@ STT_OPENAI_MODEL=whisper-1 # Override default OpenAI STT model
GROQ_BASE_URL=https://api.groq.com/openai/v1 # Custom Groq endpoint
STT_OPENAI_BASE_URL=https://api.openai.com/v1 # Custom OpenAI STT endpoint
# Text-to-Speech providers (Edge TTS needs no key)
ELEVENLABS_API_KEY=... # ElevenLabs (premium quality)
# OpenAI TTS uses VOICE_TOOLS_OPENAI_KEY
# Text-to-Speech providers (Edge TTS and NeuTTS need no key)
ELEVENLABS_API_KEY=*** # ElevenLabs (premium quality)
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
# Discord voice channel
DISCORD_BOT_TOKEN=...
@ -440,6 +455,9 @@ Provider priority (automatic fallback): **local** > **groq** > **openai**
| **Edge TTS** | Good | Free | ~1s | No |
| **ElevenLabs** | Excellent | Paid | ~2s | Yes |
| **OpenAI TTS** | Good | Paid | ~1.5s | Yes |
| **NeuTTS** | Good | Free | Depends on CPU/GPU | No |
NeuTTS uses the `tts.neutts` config block above.
---

View file

@ -97,6 +97,18 @@ WHATSAPP_MODE=bot # "bot" or "self-chat"
WHATSAPP_ALLOWED_USERS=15551234567 # Comma-separated phone numbers (with country code, no +)
```
Optional behavior settings in `~/.hermes/config.yaml`:
```yaml
unauthorized_dm_behavior: pair
whatsapp:
unauthorized_dm_behavior: ignore
```
- `unauthorized_dm_behavior: pair` is the global default. Unknown DM senders get a pairing code.
- `whatsapp.unauthorized_dm_behavior: ignore` makes WhatsApp stay silent for unauthorized DMs, which is usually the better choice for a private number.
Then start the gateway:
```bash
@ -162,6 +174,7 @@ whatsapp:
| **Bridge crashes or reconnect loops** | Restart the gateway, update Hermes, and re-pair if the session was invalidated by a WhatsApp protocol change. |
| **Bot stops working after WhatsApp update** | Update Hermes to get the latest bridge version, then re-pair. |
| **Messages not being received** | Verify `WHATSAPP_ALLOWED_USERS` includes the sender's number (with country code, no `+` or spaces). |
| **Bot replies to strangers with a pairing code** | Set `whatsapp.unauthorized_dm_behavior: ignore` in `~/.hermes/config.yaml` if you want unauthorized DMs to be silently ignored instead. |
---
@ -173,6 +186,13 @@ of authorized users. Without this setting, the gateway will **deny all incoming
safety measure.
:::
By default, unauthorized DMs still receive a pairing code reply. If you want a private WhatsApp number to stay completely silent to strangers, set:
```yaml
whatsapp:
unauthorized_dm_behavior: ignore
```
- The `~/.hermes/whatsapp/session` directory contains full session credentials — protect it like a password
- Set file permissions: `chmod 700 ~/.hermes/whatsapp/session`
- Use a **dedicated phone number** for the bot to isolate risk from your personal account

View file

@ -151,6 +151,19 @@ For more flexible authorization, Hermes includes a code-based pairing system. In
3. The bot owner runs `hermes pairing approve <platform> <code>` on the CLI
4. The user is permanently approved for that platform
Control how unauthorized direct messages are handled in `~/.hermes/config.yaml`:
```yaml
unauthorized_dm_behavior: pair
whatsapp:
unauthorized_dm_behavior: ignore
```
- `pair` is the default. Unauthorized DMs get a pairing code reply.
- `ignore` silently drops unauthorized DMs.
- Platform sections override the global default, so you can keep pairing on Telegram while keeping WhatsApp silent.
**Security features** (based on OWASP + NIST SP 800-63-4 guidance):
| Feature | Details |