Merge origin/main into hermes/hermes-5d160594

This commit is contained in:
teknium1 2026-03-14 19:34:05 -07:00
commit 3229e434b8
78 changed files with 3762 additions and 395 deletions

View file

@ -0,0 +1,424 @@
---
sidebar_position: 5
title: "Adding Providers"
description: "How to add a new inference provider to Hermes Agent — auth, runtime resolution, CLI flows, adapters, tests, and docs"
---
# Adding Providers
Hermes can already talk to any OpenAI-compatible endpoint through the custom provider path. Do not add a built-in provider unless you want first-class UX for that service:
- provider-specific auth or token refresh
- a curated model catalog
- setup / `hermes model` menu entries
- provider aliases for `provider:model` syntax
- a non-OpenAI API shape that needs an adapter
If the provider is just "another OpenAI-compatible base URL and API key", a named custom provider may be enough.
## The mental model
A built-in provider has to line up across a few layers:
1. `hermes_cli/auth.py` decides how credentials are found.
2. `hermes_cli/runtime_provider.py` turns that into runtime data:
- `provider`
- `api_mode`
- `base_url`
- `api_key`
- `source`
3. `run_agent.py` uses `api_mode` to decide how requests are built and sent.
4. `hermes_cli/models.py`, `hermes_cli/main.py`, and `hermes_cli/setup.py` make the provider show up in the CLI.
5. `agent/auxiliary_client.py` and `agent/model_metadata.py` keep side tasks and token budgeting working.
The important abstraction is `api_mode`.
- Most providers use `chat_completions`.
- Codex uses `codex_responses`.
- Anthropic uses `anthropic_messages`.
- A new non-OpenAI protocol usually means adding a new adapter and a new `api_mode` branch.
## Choose the implementation path first
### Path A — OpenAI-compatible provider
Use this when the provider accepts standard chat-completions style requests.
Typical work:
- add auth metadata
- add model catalog / aliases
- add runtime resolution
- add CLI menu wiring
- add aux-model defaults
- add tests and user docs
You usually do not need a new adapter or a new `api_mode`.
### Path B — Native provider
Use this when the provider does not behave like OpenAI chat completions.
Examples in-tree today:
- `codex_responses`
- `anthropic_messages`
This path includes everything from Path A plus:
- a provider adapter in `agent/`
- `run_agent.py` branches for request building, dispatch, usage extraction, interrupt handling, and response normalization
- adapter tests
## File checklist
### Required for every built-in provider
1. `hermes_cli/auth.py`
2. `hermes_cli/models.py`
3. `hermes_cli/runtime_provider.py`
4. `hermes_cli/main.py`
5. `hermes_cli/setup.py`
6. `agent/auxiliary_client.py`
7. `agent/model_metadata.py`
8. tests
9. user-facing docs under `website/docs/`
### Additional for native / non-OpenAI providers
10. `agent/<provider>_adapter.py`
11. `run_agent.py`
12. `pyproject.toml` if a provider SDK is required
## Step 1: Pick one canonical provider id
Choose a single provider id and use it everywhere.
Examples from the repo:
- `openai-codex`
- `kimi-coding`
- `minimax-cn`
That same id should appear in:
- `PROVIDER_REGISTRY` in `hermes_cli/auth.py`
- `_PROVIDER_LABELS` in `hermes_cli/models.py`
- `_PROVIDER_ALIASES` in both `hermes_cli/auth.py` and `hermes_cli/models.py`
- CLI `--provider` choices in `hermes_cli/main.py`
- setup / model selection branches
- auxiliary-model defaults
- tests
If the id differs between those files, the provider will feel half-wired: auth may work while `/model`, setup, or runtime resolution silently misses it.
## Step 2: Add auth metadata in `hermes_cli/auth.py`
For API-key providers, add a `ProviderConfig` entry to `PROVIDER_REGISTRY` with:
- `id`
- `name`
- `auth_type="api_key"`
- `inference_base_url`
- `api_key_env_vars`
- optional `base_url_env_var`
Also add aliases to `_PROVIDER_ALIASES`.
Use the existing providers as templates:
- simple API-key path: Z.AI, MiniMax
- API-key path with endpoint detection: Kimi, Z.AI
- native token resolution: Anthropic
- OAuth / auth-store path: Nous, OpenAI Codex
Questions to answer here:
- What env vars should Hermes check, and in what priority order?
- Does the provider need base-URL overrides?
- Does it need endpoint probing or token refresh?
- What should the auth error say when credentials are missing?
If the provider needs something more than "look up an API key", add a dedicated credential resolver instead of shoving logic into unrelated branches.
## Step 3: Add model catalog and aliases in `hermes_cli/models.py`
Update the provider catalog so the provider works in menus and in `provider:model` syntax.
Typical edits:
- `_PROVIDER_MODELS`
- `_PROVIDER_LABELS`
- `_PROVIDER_ALIASES`
- provider display order inside `list_available_providers()`
- `provider_model_ids()` if the provider supports a live `/models` fetch
If the provider exposes a live model list, prefer that first and keep `_PROVIDER_MODELS` as the static fallback.
This file is also what makes inputs like these work:
```text
anthropic:claude-sonnet-4-6
kimi:model-name
```
If aliases are missing here, the provider may authenticate correctly but still fail in `/model` parsing.
## Step 4: Resolve runtime data in `hermes_cli/runtime_provider.py`
`resolve_runtime_provider()` is the shared path used by CLI, gateway, cron, ACP, and helper clients.
Add a branch that returns a dict with at least:
```python
{
"provider": "your-provider",
"api_mode": "chat_completions", # or your native mode
"base_url": "https://...",
"api_key": "...",
"source": "env|portal|auth-store|explicit",
"requested_provider": requested_provider,
}
```
If the provider is OpenAI-compatible, `api_mode` should usually stay `chat_completions`.
Be careful with API-key precedence. Hermes already contains logic to avoid leaking an OpenRouter key to unrelated endpoints. A new provider should be equally explicit about which key goes to which base URL.
## Step 5: Wire the CLI in `hermes_cli/main.py` and `hermes_cli/setup.py`
A provider is not discoverable until it shows up in the interactive flows.
Update:
### `hermes_cli/main.py`
- `provider_labels`
- provider dispatch inside the `model` command
- `--provider` argument choices
- login/logout choices if the provider supports those flows
- a `_model_flow_<provider>()` function, or reuse `_model_flow_api_key_provider()` if it fits
### `hermes_cli/setup.py`
- `provider_choices`
- auth branch for the provider
- model-selection branch
- any provider-specific explanatory text
- any place where a provider should be excluded from OpenRouter-only prompts or routing settings
If you only update one of these files, `hermes model` and `hermes setup` will drift.
## Step 6: Keep auxiliary calls working
Two files matter here:
### `agent/auxiliary_client.py`
Add a cheap / fast default aux model to `_API_KEY_PROVIDER_AUX_MODELS` if this is a direct API-key provider.
Auxiliary tasks include things like:
- vision summarization
- web extraction summarization
- context compression summaries
- session-search summaries
- memory flushes
If the provider has no sensible aux default, side tasks may fall back badly or use an expensive main model unexpectedly.
### `agent/model_metadata.py`
Add context lengths for the provider's models so token budgeting, compression thresholds, and limits stay sane.
## Step 7: If the provider is native, add an adapter and `run_agent.py` support
If the provider is not plain chat completions, isolate the provider-specific logic in `agent/<provider>_adapter.py`.
Keep `run_agent.py` focused on orchestration. It should call adapter helpers, not hand-build provider payloads inline all over the file.
A native provider usually needs work in these places:
### New adapter file
Typical responsibilities:
- build the SDK / HTTP client
- resolve tokens
- convert OpenAI-style conversation messages to the provider's request format
- convert tool schemas if needed
- normalize provider responses back into what `run_agent.py` expects
- extract usage and finish-reason data
### `run_agent.py`
Search for `api_mode` and audit every switch point. At minimum, verify:
- `__init__` chooses the new `api_mode`
- client construction works for the provider
- `_build_api_kwargs()` knows how to format requests
- `_api_call_with_interrupt()` dispatches to the right client call
- interrupt / client rebuild paths work
- response validation accepts the provider's shape
- finish-reason extraction is correct
- token-usage extraction is correct
- fallback-model activation can switch into the new provider cleanly
- summary-generation and memory-flush paths still work
Also search `run_agent.py` for `self.client.`. Any code path that assumes the standard OpenAI client exists can break when a native provider uses a different client object or `self.client = None`.
### Prompt caching and provider-specific request fields
Prompt caching and provider-specific knobs are easy to regress.
Examples already in-tree:
- Anthropic has a native prompt-caching path
- OpenRouter gets provider-routing fields
- not every provider should receive every request-side option
When you add a native provider, double-check that Hermes is only sending fields that provider actually understands.
## Step 8: Tests
At minimum, touch the tests that guard provider wiring.
Common places:
- `tests/test_runtime_provider_resolution.py`
- `tests/test_cli_provider_resolution.py`
- `tests/test_cli_model_command.py`
- `tests/test_setup_model_selection.py`
- `tests/test_provider_parity.py`
- `tests/test_run_agent.py`
- `tests/test_<provider>_adapter.py` for a native provider
For docs-only examples, the exact file set may differ. The point is to cover:
- auth resolution
- CLI menu / provider selection
- runtime provider resolution
- agent execution path
- provider:model parsing
- any adapter-specific message conversion
Run tests with xdist disabled:
```bash
source .venv/bin/activate
python -m pytest tests/test_runtime_provider_resolution.py tests/test_cli_provider_resolution.py tests/test_cli_model_command.py tests/test_setup_model_selection.py -n0 -q
```
For deeper changes, run the full suite before pushing:
```bash
source .venv/bin/activate
python -m pytest tests/ -n0 -q
```
## Step 9: Live verification
After tests, run a real smoke test.
```bash
source .venv/bin/activate
python -m hermes_cli.main chat -q "Say hello" --provider your-provider --model your-model
```
Also test the interactive flows if you changed menus:
```bash
source .venv/bin/activate
python -m hermes_cli.main model
python -m hermes_cli.main setup
```
For native providers, verify at least one tool call too, not just a plain text response.
## Step 10: Update user-facing docs
If the provider is meant to ship as a first-class option, update the user docs too:
- `website/docs/getting-started/quickstart.md`
- `website/docs/user-guide/configuration.md`
- `website/docs/reference/environment-variables.md`
A developer can wire the provider perfectly and still leave users unable to discover the required env vars or setup flow.
## OpenAI-compatible provider checklist
Use this if the provider is standard chat completions.
- [ ] `ProviderConfig` added in `hermes_cli/auth.py`
- [ ] aliases added in `hermes_cli/auth.py` and `hermes_cli/models.py`
- [ ] model catalog added in `hermes_cli/models.py`
- [ ] runtime branch added in `hermes_cli/runtime_provider.py`
- [ ] CLI wiring added in `hermes_cli/main.py`
- [ ] setup wiring added in `hermes_cli/setup.py`
- [ ] aux model added in `agent/auxiliary_client.py`
- [ ] context lengths added in `agent/model_metadata.py`
- [ ] runtime / CLI tests updated
- [ ] user docs updated
## Native provider checklist
Use this when the provider needs a new protocol path.
- [ ] everything in the OpenAI-compatible checklist
- [ ] adapter added in `agent/<provider>_adapter.py`
- [ ] new `api_mode` supported in `run_agent.py`
- [ ] interrupt / rebuild path works
- [ ] usage and finish-reason extraction works
- [ ] fallback path works
- [ ] adapter tests added
- [ ] live smoke test passes
## Common pitfalls
### 1. Adding the provider to auth but not to model parsing
That makes credentials resolve correctly while `/model` and `provider:model` inputs fail.
### 2. Forgetting that `config["model"]` can be a string or a dict
A lot of provider-selection code has to normalize both forms.
### 3. Assuming a built-in provider is required
If the service is just OpenAI-compatible, a custom provider may already solve the user problem with less maintenance.
### 4. Forgetting auxiliary paths
The main chat path can work while summarization, memory flushes, or vision helpers fail because aux routing was never updated.
### 5. Native-provider branches hiding in `run_agent.py`
Search for `api_mode` and `self.client.`. Do not assume the obvious request path is the only one.
### 6. Sending OpenRouter-only knobs to other providers
Fields like provider routing belong only on the providers that support them.
### 7. Updating `hermes model` but not `hermes setup`
Both flows need to know about the provider.
## Good search targets while implementing
If you are hunting for all the places a provider touches, search these symbols:
- `PROVIDER_REGISTRY`
- `_PROVIDER_ALIASES`
- `_PROVIDER_MODELS`
- `resolve_runtime_provider`
- `_model_flow_`
- `provider_choices`
- `api_mode`
- `_API_KEY_PROVIDER_AUX_MODELS`
- `self.client.`
## Related docs
- [Provider Runtime Resolution](./provider-runtime.md)
- [Architecture](./architecture.md)
- [Contributing](./contributing.md)

View file

@ -41,12 +41,13 @@ If you are new to the codebase, read in this order:
2. [Agent Loop Internals](./agent-loop.md)
3. [Prompt Assembly](./prompt-assembly.md)
4. [Provider Runtime Resolution](./provider-runtime.md)
5. [Tools Runtime](./tools-runtime.md)
6. [Session Storage](./session-storage.md)
7. [Gateway Internals](./gateway-internals.md)
8. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
9. [ACP Internals](./acp-internals.md)
10. [Environments, Benchmarks & Data Generation](./environments.md)
5. [Adding Providers](./adding-providers.md)
6. [Tools Runtime](./tools-runtime.md)
7. [Session Storage](./session-storage.md)
8. [Gateway Internals](./gateway-internals.md)
9. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
10. [ACP Internals](./acp-internals.md)
11. [Environments, Benchmarks & Data Generation](./environments.md)
## Major subsystems

View file

@ -20,6 +20,12 @@ We value contributions in this order:
6. **New tools** — rarely needed; most capabilities should be skills
7. **Documentation** — fixes, clarifications, new examples
## Common contribution paths
- Building a new tool? Start with [Adding Tools](./adding-tools.md)
- Building a new skill? Start with [Creating Skills](./creating-skills.md)
- Building a new inference provider? Start with [Adding Providers](./adding-providers.md)
## Development Setup
### Prerequisites

View file

@ -20,6 +20,8 @@ Primary implementation:
- `hermes_cli/auth.py`
- `agent/auxiliary_client.py`
If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) alongside this page.
## Resolution precedence
At a high level, provider resolution uses:

View file

@ -119,6 +119,7 @@ uv pip install -e "."
| `cli` | Terminal menu UI for setup wizard | `uv pip install -e ".[cli]"` |
| `modal` | Modal cloud execution backend | `uv pip install -e ".[modal]"` |
| `tts-premium` | ElevenLabs premium voices | `uv pip install -e ".[tts-premium]"` |
| `voice` | CLI microphone input + audio playback | `uv pip install -e ".[voice]"` |
| `pty` | PTY terminal support | `uv pip install -e ".[pty]"` |
| `honcho` | AI-native memory (Honcho integration) | `uv pip install -e ".[honcho]"` |
| `mcp` | Model Context Protocol support | `uv pip install -e ".[mcp]"` |

View file

@ -54,7 +54,9 @@ Deploy Hermes Agent as a bot on your favorite messaging platform.
3. [Messaging Overview](/docs/user-guide/messaging)
4. [Telegram Setup](/docs/user-guide/messaging/telegram)
5. [Discord Setup](/docs/user-guide/messaging/discord)
6. [Security](/docs/user-guide/security)
6. [Voice Mode](/docs/user-guide/features/voice-mode)
7. [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)
8. [Security](/docs/user-guide/security)
For full project examples, see:
- [Daily Briefing Bot](/docs/guides/daily-briefing-bot)

View file

@ -129,6 +129,25 @@ Chat with Hermes from your phone or other surfaces via Telegram, Discord, Slack,
hermes gateway setup # Interactive platform configuration
```
### Add voice mode
Want microphone input in the CLI or spoken replies in messaging?
```bash
pip install hermes-agent[voice]
# Optional but recommended for free local speech-to-text
pip install faster-whisper
```
Then start Hermes and enable it inside the CLI:
```text
/voice on
```
Press `Ctrl+B` to record, or use `/voice tts` to have Hermes speak its replies. See [Voice Mode](../user-guide/features/voice-mode.md) for the full setup across CLI, Telegram, Discord, and Discord voice channels.
### Schedule automated tasks
```

View file

@ -0,0 +1,422 @@
---
sidebar_position: 7
title: "Use Voice Mode with Hermes"
description: "A practical guide to setting up and using Hermes voice mode across CLI, Telegram, Discord, and Discord voice channels"
---
# Use Voice Mode with Hermes
This guide is the practical companion to the [Voice Mode feature reference](/docs/user-guide/features/voice-mode).
If the feature page explains what voice mode can do, this guide shows how to actually use it well.
## What voice mode is good for
Voice mode is especially useful when:
- you want a hands-free CLI workflow
- you want spoken responses in Telegram or Discord
- you want Hermes sitting in a Discord voice channel for live conversation
- you want quick idea capture, debugging, or back-and-forth while walking around instead of typing
## Choose your voice mode setup
There are really three different voice experiences in Hermes.
| Mode | Best for | Platform |
|---|---|---|
| Interactive microphone loop | Personal hands-free use while coding or researching | CLI |
| Voice replies in chat | Spoken responses alongside normal messaging | Telegram, Discord |
| Live voice channel bot | Group or personal live conversation in a VC | Discord voice channels |
A good path is:
1. get text working first
2. enable voice replies second
3. move to Discord voice channels last if you want the full experience
## Step 1: make sure normal Hermes works first
Before touching voice mode, verify that:
- Hermes starts
- your provider is configured
- the agent can answer text prompts normally
```bash
hermes
```
Ask something simple:
```text
What tools do you have available?
```
If that is not solid yet, fix text mode first.
## Step 2: install the right extras
### CLI microphone + playback
```bash
pip install hermes-agent[voice]
```
### Messaging platforms
```bash
pip install hermes-agent[messaging]
```
### Premium ElevenLabs TTS
```bash
pip install hermes-agent[tts-premium]
```
### Everything
```bash
pip install hermes-agent[all]
```
## Step 3: install system dependencies
### macOS
```bash
brew install portaudio ffmpeg opus
```
### Ubuntu / Debian
```bash
sudo apt install portaudio19-dev ffmpeg libopus0
```
Why these matter:
- `portaudio` → microphone input / playback for CLI voice mode
- `ffmpeg` → audio conversion for TTS and messaging delivery
- `opus` → Discord voice codec support
## Step 4: choose STT and TTS providers
Hermes supports both local and cloud speech stacks.
### Easiest / cheapest setup
Use local STT and free Edge TTS:
- STT provider: `local`
- TTS provider: `edge`
This is usually the best place to start.
### Environment file example
Add to `~/.hermes/.env`:
```bash
# Cloud STT options (local needs no key)
GROQ_API_KEY=***
VOICE_TOOLS_OPENAI_KEY=***
# Premium TTS (optional)
ELEVENLABS_API_KEY=***
```
### Provider recommendations
#### Speech-to-text
- `local` → best default for privacy and zero-cost use
- `groq` → very fast cloud transcription
- `openai` → good paid fallback
#### Text-to-speech
- `edge` → free and good enough for most users
- `elevenlabs` → best quality
- `openai` → good middle ground
## Step 5: recommended config
```yaml
voice:
record_key: "ctrl+b"
max_recording_seconds: 120
auto_tts: false
silence_threshold: 200
silence_duration: 3.0
stt:
provider: "local"
local:
model: "base"
tts:
provider: "edge"
edge:
voice: "en-US-AriaNeural"
```
This is a good conservative default for most people.
## Use case 1: CLI voice mode
## Turn it on
Start Hermes:
```bash
hermes
```
Inside the CLI:
```text
/voice on
```
### Recording flow
Default key:
- `Ctrl+B`
Workflow:
1. press `Ctrl+B`
2. speak
3. wait for silence detection to stop recording automatically
4. Hermes transcribes and responds
5. if TTS is on, it speaks the answer
6. the loop can automatically restart for continuous use
### Useful commands
```text
/voice
/voice on
/voice off
/voice tts
/voice status
```
### Good CLI workflows
#### Walk-up debugging
Say:
```text
I keep getting a docker permission error. Help me debug it.
```
Then continue hands-free:
- "Read the last error again"
- "Explain the root cause in simpler terms"
- "Now give me the exact fix"
#### Research / brainstorming
Great for:
- walking around while thinking
- dictating half-formed ideas
- asking Hermes to structure your thoughts in real time
#### Accessibility / low-typing sessions
If typing is inconvenient, voice mode is one of the fastest ways to stay in the full Hermes loop.
## Tuning CLI behavior
### Silence threshold
If Hermes starts/stops too aggressively, tune:
```yaml
voice:
silence_threshold: 250
```
Higher threshold = less sensitive.
### Silence duration
If you pause a lot between sentences, increase:
```yaml
voice:
silence_duration: 4.0
```
### Record key
If `Ctrl+B` conflicts with your terminal or tmux habits:
```yaml
voice:
record_key: "ctrl+space"
```
## Use case 2: voice replies in Telegram or Discord
This mode is simpler than full voice channels.
Hermes stays a normal chat bot, but can speak replies.
### Start the gateway
```bash
hermes gateway
```
### Turn on voice replies
Inside Telegram or Discord:
```text
/voice on
```
or
```text
/voice tts
```
### Modes
| Mode | Meaning |
|---|---|
| `off` | text only |
| `voice_only` | speak only when the user sent voice |
| `all` | speak every reply |
### When to use which mode
- `/voice on` if you want spoken replies only for voice-originating messages
- `/voice tts` if you want a full spoken assistant all the time
### Good messaging workflows
#### Telegram assistant on your phone
Use when:
- you are away from your machine
- you want to send voice notes and get quick spoken replies
- you want Hermes to function like a portable research or ops assistant
#### Discord DMs with spoken output
Useful when you want private interaction without server-channel mention behavior.
## Use case 3: Discord voice channels
This is the most advanced mode.
Hermes joins a Discord VC, listens to user speech, transcribes it, runs the normal agent pipeline, and speaks replies back into the channel.
## Required Discord permissions
In addition to the normal text-bot setup, make sure the bot has:
- Connect
- Speak
- preferably Use Voice Activity
Also enable privileged intents in the Developer Portal:
- Presence Intent
- Server Members Intent
- Message Content Intent
## Join and leave
In a Discord text channel where the bot is present:
```text
/voice join
/voice leave
/voice status
```
### What happens when joined
- users speak in the VC
- Hermes detects speech boundaries
- transcripts are posted in the associated text channel
- Hermes responds in text and audio
- the text channel is the one where `/voice join` was issued
### Best practices for Discord VC use
- keep `DISCORD_ALLOWED_USERS` tight
- use a dedicated bot/testing channel at first
- verify STT and TTS work in ordinary text-chat voice mode before trying VC mode
## Voice quality recommendations
### Best quality setup
- STT: local `large-v3` or Groq `whisper-large-v3`
- TTS: ElevenLabs
### Best speed / convenience setup
- STT: local `base` or Groq
- TTS: Edge
### Best zero-cost setup
- STT: local
- TTS: Edge
## Common failure modes
### "No audio device found"
Install `portaudio`.
### "Bot joins but hears nothing"
Check:
- your Discord user ID is in `DISCORD_ALLOWED_USERS`
- you are not muted
- privileged intents are enabled
- the bot has Connect/Speak permissions
### "It transcribes but does not speak"
Check:
- TTS provider config
- API key / quota for ElevenLabs or OpenAI
- `ffmpeg` install for Edge conversion paths
### "Whisper outputs garbage"
Try:
- quieter environment
- higher `silence_threshold`
- different STT provider/model
- shorter, clearer utterances
### "It works in DMs but not in server channels"
That is often mention policy.
By default, the bot needs an `@mention` in Discord server text channels unless configured otherwise.
## Suggested first-week setup
If you want the shortest path to success:
1. get text Hermes working
2. install `hermes-agent[voice]`
3. use CLI voice mode with local STT + Edge TTS
4. then enable `/voice on` in Telegram or Discord
5. only after that, try Discord VC mode
That progression keeps the debugging surface small.
## Where to read next
- [Voice Mode feature reference](/docs/user-guide/features/voice-mode)
- [Messaging Gateway](/docs/user-guide/messaging)
- [Discord setup](/docs/user-guide/messaging/discord)
- [Telegram setup](/docs/user-guide/messaging/telegram)
- [Configuration](/docs/user-guide/configuration)

View file

@ -33,6 +33,8 @@ It's not a coding copilot tethered to an IDE or a chatbot wrapper around a singl
| 📚 **[Skills System](/docs/user-guide/features/skills)** | Procedural memory the agent creates and reuses |
| 🔌 **[MCP Integration](/docs/user-guide/features/mcp)** | Connect to MCP servers, filter their tools, and extend Hermes safely |
| 🧭 **[Use MCP with Hermes](/docs/guides/use-mcp-with-hermes)** | Practical MCP setup patterns, examples, and tutorials |
| 🎙️ **[Voice Mode](/docs/user-guide/features/voice-mode)** | Real-time voice interaction in CLI, Telegram, Discord, and Discord VC |
| 🗣️ **[Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)** | Hands-on setup and usage patterns for Hermes voice workflows |
| 🎭 **[Personality & SOUL.md](/docs/user-guide/features/personality)** | Define Hermes' default voice with a global SOUL.md |
| 📄 **[Context Files](/docs/user-guide/features/context-files)** | Project context files that shape every conversation |
| 🔒 **[Security](/docs/user-guide/security)** | Command approval, authorization, container isolation |

View file

@ -31,7 +31,7 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
| `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code setup-token (same as `ANTHROPIC_TOKEN`) |
| `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
| `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for TTS and voice transcription (separate from custom endpoint) |
| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for OpenAI speech-to-text and text-to-speech providers |
| `HERMES_HOME` | Override Hermes config directory (default: `~/.hermes`) |
## Provider Auth (OAuth)
@ -57,7 +57,12 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
| `BROWSERBASE_PROJECT_ID` | Browserbase project ID |
| `BROWSER_INACTIVITY_TIMEOUT` | Browser session inactivity timeout in seconds |
| `FAL_KEY` | Image generation ([fal.ai](https://fal.ai/)) |
| `ELEVENLABS_API_KEY` | Premium TTS voices ([elevenlabs.io](https://elevenlabs.io/)) |
| `GROQ_API_KEY` | Groq Whisper STT API key ([groq.com](https://groq.com/)) |
| `ELEVENLABS_API_KEY` | ElevenLabs premium TTS voices ([elevenlabs.io](https://elevenlabs.io/)) |
| `STT_GROQ_MODEL` | Override the Groq STT model (default: `whisper-large-v3-turbo`) |
| `GROQ_BASE_URL` | Override the Groq OpenAI-compatible STT endpoint |
| `STT_OPENAI_MODEL` | Override the OpenAI STT model (default: `whisper-1`) |
| `STT_OPENAI_BASE_URL` | Override the OpenAI-compatible STT endpoint |
| `HONCHO_API_KEY` | Cross-session user modeling ([honcho.dev](https://honcho.dev/)) |
| `TINKER_API_KEY` | RL training ([tinker-console.thinkingmachines.ai](https://tinker-console.thinkingmachines.ai/)) |
| `WANDB_API_KEY` | RL training metrics ([wandb.ai](https://wandb.ai/)) |

View file

@ -45,6 +45,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
| `/verbose` | Cycle tool progress display: off → new → all → verbose |
| `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) |
| `/skin` | Show or change the display skin/theme |
| `/voice [on\|off\|tts\|status]` | Toggle CLI voice mode and spoken playback. Recording uses `voice.record_key` (default: `Ctrl+B`). |
### Tools & Skills
@ -105,6 +106,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
| `/usage` | Show token usage for the current session. |
| `/insights [days]` | Show usage analytics. |
| `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display. |
| `/voice [on\|off\|tts\|join\|channel\|leave\|status]` | Control spoken replies in chat. `join`/`channel`/`leave` manage Discord voice-channel mode. |
| `/rollback [number]` | List or restore filesystem checkpoints. |
| `/background &lt;prompt&gt;` | Run a prompt in a separate background session. |
| `/reload-mcp` | Reload MCP servers from config. |
@ -116,4 +118,5 @@ The messaging gateway supports the following built-in commands inside Telegram,
- `/skin`, `/tools`, `/toolsets`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, and `/verbose` are **CLI-only** commands.
- `/status`, `/stop`, `/sethome`, `/resume`, `/background`, and `/update` are **messaging-only** commands.
- `/reload-mcp` and `/rollback` work in **both** the CLI and the messaging gateway.
- `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway.
- `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord.

View file

@ -77,6 +77,7 @@ When resuming a previous session (`hermes -c` or `hermes --resume <id>`), a "Pre
| `Alt+Enter` or `Ctrl+J` | New line (multi-line input) |
| `Alt+V` | Paste an image from the clipboard when supported by the terminal |
| `Ctrl+V` | Paste text and opportunistically attach clipboard images |
| `Ctrl+B` | Start/stop voice recording when voice mode is enabled (`voice.record_key`, default: `ctrl+b`) |
| `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) |
| `Ctrl+D` | Exit |
| `Tab` | Autocomplete slash commands |
@ -95,11 +96,15 @@ Common examples:
| `/skills browse` | Browse the skills hub and official optional skills |
| `/background <prompt>` | Run a prompt in a separate background session |
| `/skin` | Show or switch the active CLI skin |
| `/voice on` | Enable CLI voice mode (press `Ctrl+B` to record) |
| `/voice tts` | Toggle spoken playback for Hermes replies |
| `/reasoning high` | Increase reasoning effort |
| `/title My Session` | Name the current session |
For the full built-in CLI and messaging lists, see [Slash Commands Reference](../reference/slash-commands.md).
For setup, providers, silence tuning, and messaging/Discord voice usage, see [Voice Mode](features/voice-mode.md).
:::tip
Commands are case-insensitive — `/HELP` works the same as `/help`. Installed skills also become slash commands automatically.
:::

View file

@ -695,6 +695,8 @@ tts:
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
```
This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).
## Display Settings
```yaml
@ -719,10 +721,43 @@ display:
```yaml
stt:
provider: "openai" # STT provider
provider: "local" # "local" | "groq" | "openai"
local:
model: "base" # tiny, base, small, medium, large-v3
openai:
model: "whisper-1" # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
# model: "whisper-1" # Legacy fallback key still respected
```
Requires `VOICE_TOOLS_OPENAI_KEY` in `.env` for OpenAI STT.
Provider behavior:
- `local` uses `faster-whisper` running on your machine. Install it separately with `pip install faster-whisper`.
- `groq` uses Groq's Whisper-compatible endpoint and reads `GROQ_API_KEY`.
- `openai` uses the OpenAI speech API and reads `VOICE_TOOLS_OPENAI_KEY`.
If the requested provider is unavailable, Hermes falls back automatically in this order: `local``groq``openai`.
Groq and OpenAI model overrides are environment-driven:
```bash
STT_GROQ_MODEL=whisper-large-v3-turbo
STT_OPENAI_MODEL=whisper-1
GROQ_BASE_URL=https://api.groq.com/openai/v1
STT_OPENAI_BASE_URL=https://api.openai.com/v1
```
## Voice Mode (CLI)
```yaml
voice:
record_key: "ctrl+b" # Push-to-talk key inside the CLI
max_recording_seconds: 120 # Hard stop for long recordings
auto_tts: false # Enable spoken replies automatically when /voice on
silence_threshold: 200 # RMS threshold for speech detection
silence_duration: 3.0 # Seconds of silence before auto-stop
```
Use `/voice on` in the CLI to enable microphone mode, `record_key` to start/stop recording, and `/voice tts` to toggle spoken replies. See [Voice Mode](/docs/user-guide/features/voice-mode) for end-to-end setup and platform-specific behavior.
## Quick Commands

View file

@ -194,6 +194,8 @@ The agent's final response is automatically delivered. You do not need to call `
## Schedule formats
The agent's final response is automatically delivered — you do **not** need to include `send_message` in the cron prompt for that same destination. If a cron run calls `send_message` to the exact target the scheduler will already deliver to, Hermes skips that duplicate send and tells the model to put the user-facing content in the final response instead. Use `send_message` only for additional or different targets.
### Relative delays (one-shot)
```text

View file

@ -8,12 +8,14 @@ description: "Real-time voice conversations with Hermes Agent — CLI, Telegram,
Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
If you want a practical setup walkthrough with recommended configurations and real usage patterns, see [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
## Prerequisites
Before using voice features, make sure you have:
1. **Hermes Agent installed**`pip install hermes-agent` (see [Getting Started](../../getting-started.md))
2. **An LLM provider configured**set `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and `LLM_MODEL` in `~/.hermes/.env`
1. **Hermes Agent installed**`pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
2. **An LLM provider configured**run `hermes model` or set your preferred provider credentials in `~/.hermes/.env`
3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice
:::tip

View file

@ -210,8 +210,13 @@ Replace the ID with the actual channel ID (right-click → Copy Channel ID with
Hermes Agent supports Discord voice messages:
- **Incoming voice messages** are automatically transcribed using Whisper (requires `GROQ_API_KEY` or `VOICE_TOOLS_OPENAI_KEY` to be set in your environment).
- **Incoming voice messages** are automatically transcribed using the configured STT provider: local `faster-whisper` (no key), Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`).
- **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies.
- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel.
For the full setup and operational guide, see:
- [Voice Mode](/docs/user-guide/features/voice-mode)
- [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)
## Troubleshooting

View file

@ -8,6 +8,8 @@ description: "Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal,
Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant, or your browser. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages.
For the full voice feature set — including CLI microphone mode, spoken replies in messaging, and Discord voice-channel conversations — see [Voice Mode](/docs/user-guide/features/voice-mode) and [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
## Architecture
```text
@ -77,6 +79,7 @@ hermes gateway status # Check service status
| `/usage` | Show token usage for this session |
| `/insights [days]` | Show usage insights and analytics |
| `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display |
| `/voice [on\|off\|tts\|join\|leave\|status]` | Control messaging voice replies and Discord voice-channel behavior |
| `/rollback [number]` | List or restore filesystem checkpoints |
| `/background <prompt>` | Run a prompt in a separate background session |
| `/reload-mcp` | Reload MCP servers from config |

View file

@ -224,7 +224,7 @@ Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`).
Hermes supports voice on Slack:
- **Incoming:** Voice/audio messages are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
- **Incoming:** Voice/audio messages are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
- **Outgoing:** TTS responses are sent as audio file attachments
---

View file

@ -131,7 +131,11 @@ Group chat IDs are negative numbers (e.g., `-1001234567890`). Your personal DM c
### Incoming Voice (Speech-to-Text)
Voice messages you send on Telegram are automatically transcribed using OpenAI's Whisper API and injected as text into the conversation. This requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.
Voice messages you send on Telegram are automatically transcribed by Hermes's configured STT provider and injected as text into the conversation.
- `local` uses `faster-whisper` on the machine running Hermes — no API key required
- `groq` uses Groq Whisper and requires `GROQ_API_KEY`
- `openai` uses OpenAI Whisper and requires `VOICE_TOOLS_OPENAI_KEY`
### Outgoing Voice (Text-to-Speech)
@ -173,7 +177,7 @@ Hermes Agent works in Telegram group chats with a few considerations:
| Bot not responding at all | Verify `TELEGRAM_BOT_TOKEN` is correct. Check `hermes gateway` logs for errors. |
| Bot responds with "unauthorized" | Your user ID is not in `TELEGRAM_ALLOWED_USERS`. Double-check with @userinfobot. |
| Bot ignores group messages | Privacy mode is likely on. Disable it (Step 3) or make the bot a group admin. **Remember to remove and re-add the bot after changing privacy.** |
| Voice messages not transcribed | Check that `VOICE_TOOLS_OPENAI_KEY` is set and valid in `~/.hermes/.env`. |
| Voice messages not transcribed | Verify STT is available: install `faster-whisper` for local transcription, or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`. |
| Voice replies are files, not bubbles | Install `ffmpeg` (needed for Edge TTS Opus conversion). |
| Bot token revoked/invalid | Generate a new token via `/revoke` then `/newbot` or `/token` in BotFather. Update your `.env` file. |

View file

@ -137,7 +137,7 @@ with reconnection logic.
Hermes supports voice on WhatsApp:
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
- **Outgoing:** TTS responses are sent as MP3 audio file attachments
- Agent responses are prefixed with "⚕ **Hermes Agent**" for easy identification