Merge origin/main into hermes/hermes-5d160594

2026-05-01 01:51:44 +00:00 · 2026-03-14 19:34:05 -07:00 · 2026-03-14 19:34:05 -07:00 · 3229e434b8
commit 3229e434b8
parent 2536ff328b 81cd367aec
78 changed files with 3762 additions and 395 deletions
--- a/website/docs/developer-guide/adding-providers.md
+++ b/website/docs/developer-guide/adding-providers.md
@ -0,0 +1,424 @@
+---
+sidebar_position: 5
+title: "Adding Providers"
+description: "How to add a new inference provider to Hermes Agent — auth, runtime resolution, CLI flows, adapters, tests, and docs"
+---
+
+# Adding Providers
+
+Hermes can already talk to any OpenAI-compatible endpoint through the custom provider path. Do not add a built-in provider unless you want first-class UX for that service:
+
+- provider-specific auth or token refresh
+- a curated model catalog
+- setup / `hermes model` menu entries
+- provider aliases for `provider:model` syntax
+- a non-OpenAI API shape that needs an adapter
+
+If the provider is just "another OpenAI-compatible base URL and API key", a named custom provider may be enough.
+
+## The mental model
+
+A built-in provider has to line up across a few layers:
+
+1. `hermes_cli/auth.py` decides how credentials are found.
+2. `hermes_cli/runtime_provider.py` turns that into runtime data:
+   - `provider`
+   - `api_mode`
+   - `base_url`
+   - `api_key`
+   - `source`
+3. `run_agent.py` uses `api_mode` to decide how requests are built and sent.
+4. `hermes_cli/models.py`, `hermes_cli/main.py`, and `hermes_cli/setup.py` make the provider show up in the CLI.
+5. `agent/auxiliary_client.py` and `agent/model_metadata.py` keep side tasks and token budgeting working.
+
+The important abstraction is `api_mode`.
+
+- Most providers use `chat_completions`.
+- Codex uses `codex_responses`.
+- Anthropic uses `anthropic_messages`.
+- A new non-OpenAI protocol usually means adding a new adapter and a new `api_mode` branch.
+
+## Choose the implementation path first
+
+### Path A — OpenAI-compatible provider
+
+Use this when the provider accepts standard chat-completions style requests.
+
+Typical work:
+
+- add auth metadata
+- add model catalog / aliases
+- add runtime resolution
+- add CLI menu wiring
+- add aux-model defaults
+- add tests and user docs
+
+You usually do not need a new adapter or a new `api_mode`.
+
+### Path B — Native provider
+
+Use this when the provider does not behave like OpenAI chat completions.
+
+Examples in-tree today:
+
+- `codex_responses`
+- `anthropic_messages`
+
+This path includes everything from Path A plus:
+
+- a provider adapter in `agent/`
+- `run_agent.py` branches for request building, dispatch, usage extraction, interrupt handling, and response normalization
+- adapter tests
+
+## File checklist
+
+### Required for every built-in provider
+
+1. `hermes_cli/auth.py`
+2. `hermes_cli/models.py`
+3. `hermes_cli/runtime_provider.py`
+4. `hermes_cli/main.py`
+5. `hermes_cli/setup.py`
+6. `agent/auxiliary_client.py`
+7. `agent/model_metadata.py`
+8. tests
+9. user-facing docs under `website/docs/`
+
+### Additional for native / non-OpenAI providers
+
+10. `agent/<provider>_adapter.py`
+11. `run_agent.py`
+12. `pyproject.toml` if a provider SDK is required
+
+## Step 1: Pick one canonical provider id
+
+Choose a single provider id and use it everywhere.
+
+Examples from the repo:
+
+- `openai-codex`
+- `kimi-coding`
+- `minimax-cn`
+
+That same id should appear in:
+
+- `PROVIDER_REGISTRY` in `hermes_cli/auth.py`
+- `_PROVIDER_LABELS` in `hermes_cli/models.py`
+- `_PROVIDER_ALIASES` in both `hermes_cli/auth.py` and `hermes_cli/models.py`
+- CLI `--provider` choices in `hermes_cli/main.py`
+- setup / model selection branches
+- auxiliary-model defaults
+- tests
+
+If the id differs between those files, the provider will feel half-wired: auth may work while `/model`, setup, or runtime resolution silently misses it.
+
+## Step 2: Add auth metadata in `hermes_cli/auth.py`
+
+For API-key providers, add a `ProviderConfig` entry to `PROVIDER_REGISTRY` with:
+
+- `id`
+- `name`
+- `auth_type="api_key"`
+- `inference_base_url`
+- `api_key_env_vars`
+- optional `base_url_env_var`
+
+Also add aliases to `_PROVIDER_ALIASES`.
+
+Use the existing providers as templates:
+
+- simple API-key path: Z.AI, MiniMax
+- API-key path with endpoint detection: Kimi, Z.AI
+- native token resolution: Anthropic
+- OAuth / auth-store path: Nous, OpenAI Codex
+
+Questions to answer here:
+
+- What env vars should Hermes check, and in what priority order?
+- Does the provider need base-URL overrides?
+- Does it need endpoint probing or token refresh?
+- What should the auth error say when credentials are missing?
+
+If the provider needs something more than "look up an API key", add a dedicated credential resolver instead of shoving logic into unrelated branches.
+
+## Step 3: Add model catalog and aliases in `hermes_cli/models.py`
+
+Update the provider catalog so the provider works in menus and in `provider:model` syntax.
+
+Typical edits:
+
+- `_PROVIDER_MODELS`
+- `_PROVIDER_LABELS`
+- `_PROVIDER_ALIASES`
+- provider display order inside `list_available_providers()`
+- `provider_model_ids()` if the provider supports a live `/models` fetch
+
+If the provider exposes a live model list, prefer that first and keep `_PROVIDER_MODELS` as the static fallback.
+
+This file is also what makes inputs like these work:
+
+```text
+anthropic:claude-sonnet-4-6
+kimi:model-name
+```
+
+If aliases are missing here, the provider may authenticate correctly but still fail in `/model` parsing.
+
+## Step 4: Resolve runtime data in `hermes_cli/runtime_provider.py`
+
+`resolve_runtime_provider()` is the shared path used by CLI, gateway, cron, ACP, and helper clients.
+
+Add a branch that returns a dict with at least:
+
+```python
+{
+    "provider": "your-provider",
+    "api_mode": "chat_completions",  # or your native mode
+    "base_url": "https://...",
+    "api_key": "...",
+    "source": "env|portal|auth-store|explicit",
+    "requested_provider": requested_provider,
+}
+```
+
+If the provider is OpenAI-compatible, `api_mode` should usually stay `chat_completions`.
+
+Be careful with API-key precedence. Hermes already contains logic to avoid leaking an OpenRouter key to unrelated endpoints. A new provider should be equally explicit about which key goes to which base URL.
+
+## Step 5: Wire the CLI in `hermes_cli/main.py` and `hermes_cli/setup.py`
+
+A provider is not discoverable until it shows up in the interactive flows.
+
+Update:
+
+### `hermes_cli/main.py`
+
+- `provider_labels`
+- provider dispatch inside the `model` command
+- `--provider` argument choices
+- login/logout choices if the provider supports those flows
+- a `_model_flow_<provider>()` function, or reuse `_model_flow_api_key_provider()` if it fits
+
+### `hermes_cli/setup.py`
+
+- `provider_choices`
+- auth branch for the provider
+- model-selection branch
+- any provider-specific explanatory text
+- any place where a provider should be excluded from OpenRouter-only prompts or routing settings
+
+If you only update one of these files, `hermes model` and `hermes setup` will drift.
+
+## Step 6: Keep auxiliary calls working
+
+Two files matter here:
+
+### `agent/auxiliary_client.py`
+
+Add a cheap / fast default aux model to `_API_KEY_PROVIDER_AUX_MODELS` if this is a direct API-key provider.
+
+Auxiliary tasks include things like:
+
+- vision summarization
+- web extraction summarization
+- context compression summaries
+- session-search summaries
+- memory flushes
+
+If the provider has no sensible aux default, side tasks may fall back badly or use an expensive main model unexpectedly.
+
+### `agent/model_metadata.py`
+
+Add context lengths for the provider's models so token budgeting, compression thresholds, and limits stay sane.
+
+## Step 7: If the provider is native, add an adapter and `run_agent.py` support
+
+If the provider is not plain chat completions, isolate the provider-specific logic in `agent/<provider>_adapter.py`.
+
+Keep `run_agent.py` focused on orchestration. It should call adapter helpers, not hand-build provider payloads inline all over the file.
+
+A native provider usually needs work in these places:
+
+### New adapter file
+
+Typical responsibilities:
+
+- build the SDK / HTTP client
+- resolve tokens
+- convert OpenAI-style conversation messages to the provider's request format
+- convert tool schemas if needed
+- normalize provider responses back into what `run_agent.py` expects
+- extract usage and finish-reason data
+
+### `run_agent.py`
+
+Search for `api_mode` and audit every switch point. At minimum, verify:
+
+- `__init__` chooses the new `api_mode`
+- client construction works for the provider
+- `_build_api_kwargs()` knows how to format requests
+- `_api_call_with_interrupt()` dispatches to the right client call
+- interrupt / client rebuild paths work
+- response validation accepts the provider's shape
+- finish-reason extraction is correct
+- token-usage extraction is correct
+- fallback-model activation can switch into the new provider cleanly
+- summary-generation and memory-flush paths still work
+
+Also search `run_agent.py` for `self.client.`. Any code path that assumes the standard OpenAI client exists can break when a native provider uses a different client object or `self.client = None`.
+
+### Prompt caching and provider-specific request fields
+
+Prompt caching and provider-specific knobs are easy to regress.
+
+Examples already in-tree:
+
+- Anthropic has a native prompt-caching path
+- OpenRouter gets provider-routing fields
+- not every provider should receive every request-side option
+
+When you add a native provider, double-check that Hermes is only sending fields that provider actually understands.
+
+## Step 8: Tests
+
+At minimum, touch the tests that guard provider wiring.
+
+Common places:
+
+- `tests/test_runtime_provider_resolution.py`
+- `tests/test_cli_provider_resolution.py`
+- `tests/test_cli_model_command.py`
+- `tests/test_setup_model_selection.py`
+- `tests/test_provider_parity.py`
+- `tests/test_run_agent.py`
+- `tests/test_<provider>_adapter.py` for a native provider
+
+For docs-only examples, the exact file set may differ. The point is to cover:
+
+- auth resolution
+- CLI menu / provider selection
+- runtime provider resolution
+- agent execution path
+- provider:model parsing
+- any adapter-specific message conversion
+
+Run tests with xdist disabled:
+
+```bash
+source .venv/bin/activate
+python -m pytest tests/test_runtime_provider_resolution.py tests/test_cli_provider_resolution.py tests/test_cli_model_command.py tests/test_setup_model_selection.py -n0 -q
+```
+
+For deeper changes, run the full suite before pushing:
+
+```bash
+source .venv/bin/activate
+python -m pytest tests/ -n0 -q
+```
+
+## Step 9: Live verification
+
+After tests, run a real smoke test.
+
+```bash
+source .venv/bin/activate
+python -m hermes_cli.main chat -q "Say hello" --provider your-provider --model your-model
+```
+
+Also test the interactive flows if you changed menus:
+
+```bash
+source .venv/bin/activate
+python -m hermes_cli.main model
+python -m hermes_cli.main setup
+```
+
+For native providers, verify at least one tool call too, not just a plain text response.
+
+## Step 10: Update user-facing docs
+
+If the provider is meant to ship as a first-class option, update the user docs too:
+
+- `website/docs/getting-started/quickstart.md`
+- `website/docs/user-guide/configuration.md`
+- `website/docs/reference/environment-variables.md`
+
+A developer can wire the provider perfectly and still leave users unable to discover the required env vars or setup flow.
+
+## OpenAI-compatible provider checklist
+
+Use this if the provider is standard chat completions.
+
+- [ ] `ProviderConfig` added in `hermes_cli/auth.py`
+- [ ] aliases added in `hermes_cli/auth.py` and `hermes_cli/models.py`
+- [ ] model catalog added in `hermes_cli/models.py`
+- [ ] runtime branch added in `hermes_cli/runtime_provider.py`
+- [ ] CLI wiring added in `hermes_cli/main.py`
+- [ ] setup wiring added in `hermes_cli/setup.py`
+- [ ] aux model added in `agent/auxiliary_client.py`
+- [ ] context lengths added in `agent/model_metadata.py`
+- [ ] runtime / CLI tests updated
+- [ ] user docs updated
+
+## Native provider checklist
+
+Use this when the provider needs a new protocol path.
+
+- [ ] everything in the OpenAI-compatible checklist
+- [ ] adapter added in `agent/<provider>_adapter.py`
+- [ ] new `api_mode` supported in `run_agent.py`
+- [ ] interrupt / rebuild path works
+- [ ] usage and finish-reason extraction works
+- [ ] fallback path works
+- [ ] adapter tests added
+- [ ] live smoke test passes
+
+## Common pitfalls
+
+### 1. Adding the provider to auth but not to model parsing
+
+That makes credentials resolve correctly while `/model` and `provider:model` inputs fail.
+
+### 2. Forgetting that `config["model"]` can be a string or a dict
+
+A lot of provider-selection code has to normalize both forms.
+
+### 3. Assuming a built-in provider is required
+
+If the service is just OpenAI-compatible, a custom provider may already solve the user problem with less maintenance.
+
+### 4. Forgetting auxiliary paths
+
+The main chat path can work while summarization, memory flushes, or vision helpers fail because aux routing was never updated.
+
+### 5. Native-provider branches hiding in `run_agent.py`
+
+Search for `api_mode` and `self.client.`. Do not assume the obvious request path is the only one.
+
+### 6. Sending OpenRouter-only knobs to other providers
+
+Fields like provider routing belong only on the providers that support them.
+
+### 7. Updating `hermes model` but not `hermes setup`
+
+Both flows need to know about the provider.
+
+## Good search targets while implementing
+
+If you are hunting for all the places a provider touches, search these symbols:
+
+- `PROVIDER_REGISTRY`
+- `_PROVIDER_ALIASES`
+- `_PROVIDER_MODELS`
+- `resolve_runtime_provider`
+- `_model_flow_`
+- `provider_choices`
+- `api_mode`
+- `_API_KEY_PROVIDER_AUX_MODELS`
+- `self.client.`
+
+## Related docs
+
+- [Provider Runtime Resolution](./provider-runtime.md)
+- [Architecture](./architecture.md)
+- [Contributing](./contributing.md)
--- a/website/docs/developer-guide/architecture.md
+++ b/website/docs/developer-guide/architecture.md
@ -41,12 +41,13 @@ If you are new to the codebase, read in this order:
 2. [Agent Loop Internals](./agent-loop.md)
 3. [Prompt Assembly](./prompt-assembly.md)
 4. [Provider Runtime Resolution](./provider-runtime.md)
-5. [Tools Runtime](./tools-runtime.md)
-6. [Session Storage](./session-storage.md)
-7. [Gateway Internals](./gateway-internals.md)
-8. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
-9. [ACP Internals](./acp-internals.md)
-10. [Environments, Benchmarks & Data Generation](./environments.md)
+5. [Adding Providers](./adding-providers.md)
+6. [Tools Runtime](./tools-runtime.md)
+7. [Session Storage](./session-storage.md)
+8. [Gateway Internals](./gateway-internals.md)
+9. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
+10. [ACP Internals](./acp-internals.md)
+11. [Environments, Benchmarks & Data Generation](./environments.md)

 ## Major subsystems

--- a/website/docs/developer-guide/contributing.md
+++ b/website/docs/developer-guide/contributing.md
@ -20,6 +20,12 @@ We value contributions in this order:
 6. **New tools** — rarely needed; most capabilities should be skills
 7. **Documentation** — fixes, clarifications, new examples

+## Common contribution paths
+
+- Building a new tool? Start with [Adding Tools](./adding-tools.md)
+- Building a new skill? Start with [Creating Skills](./creating-skills.md)
+- Building a new inference provider? Start with [Adding Providers](./adding-providers.md)
+
 ## Development Setup

 ### Prerequisites
--- a/website/docs/developer-guide/provider-runtime.md
+++ b/website/docs/developer-guide/provider-runtime.md
@ -20,6 +20,8 @@ Primary implementation:
 - `hermes_cli/auth.py`
 - `agent/auxiliary_client.py`

+If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) alongside this page.
+
 ## Resolution precedence

 At a high level, provider resolution uses:
--- a/website/docs/getting-started/installation.md
+++ b/website/docs/getting-started/installation.md
@ -119,6 +119,7 @@ uv pip install -e "."
 | `cli` | Terminal menu UI for setup wizard | `uv pip install -e ".[cli]"` |
 | `modal` | Modal cloud execution backend | `uv pip install -e ".[modal]"` |
 | `tts-premium` | ElevenLabs premium voices | `uv pip install -e ".[tts-premium]"` |
+| `voice` | CLI microphone input + audio playback | `uv pip install -e ".[voice]"` |
 | `pty` | PTY terminal support | `uv pip install -e ".[pty]"` |
 | `honcho` | AI-native memory (Honcho integration) | `uv pip install -e ".[honcho]"` |
 | `mcp` | Model Context Protocol support | `uv pip install -e ".[mcp]"` |
--- a/website/docs/getting-started/learning-path.md
+++ b/website/docs/getting-started/learning-path.md
@ -54,7 +54,9 @@ Deploy Hermes Agent as a bot on your favorite messaging platform.
 3. [Messaging Overview](/docs/user-guide/messaging)
 4. [Telegram Setup](/docs/user-guide/messaging/telegram)
 5. [Discord Setup](/docs/user-guide/messaging/discord)
-6. [Security](/docs/user-guide/security)
+6. [Voice Mode](/docs/user-guide/features/voice-mode)
+7. [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)
+8. [Security](/docs/user-guide/security)

 For full project examples, see:
 - [Daily Briefing Bot](/docs/guides/daily-briefing-bot)
--- a/website/docs/getting-started/quickstart.md
+++ b/website/docs/getting-started/quickstart.md
@ -129,6 +129,25 @@ Chat with Hermes from your phone or other surfaces via Telegram, Discord, Slack,
 hermes gateway setup    # Interactive platform configuration
 ```

+### Add voice mode
+
+Want microphone input in the CLI or spoken replies in messaging?
+
+```bash
+pip install hermes-agent[voice]
+
+# Optional but recommended for free local speech-to-text
+pip install faster-whisper
+```
+
+Then start Hermes and enable it inside the CLI:
+
+```text
+/voice on
+```
+
+Press `Ctrl+B` to record, or use `/voice tts` to have Hermes speak its replies. See [Voice Mode](../user-guide/features/voice-mode.md) for the full setup across CLI, Telegram, Discord, and Discord voice channels.
+
 ### Schedule automated tasks

 ```
--- a/website/docs/guides/use-voice-mode-with-hermes.md
+++ b/website/docs/guides/use-voice-mode-with-hermes.md
@ -0,0 +1,422 @@
+---
+sidebar_position: 7
+title: "Use Voice Mode with Hermes"
+description: "A practical guide to setting up and using Hermes voice mode across CLI, Telegram, Discord, and Discord voice channels"
+---
+
+# Use Voice Mode with Hermes
+
+This guide is the practical companion to the [Voice Mode feature reference](/docs/user-guide/features/voice-mode).
+
+If the feature page explains what voice mode can do, this guide shows how to actually use it well.
+
+## What voice mode is good for
+
+Voice mode is especially useful when:
+- you want a hands-free CLI workflow
+- you want spoken responses in Telegram or Discord
+- you want Hermes sitting in a Discord voice channel for live conversation
+- you want quick idea capture, debugging, or back-and-forth while walking around instead of typing
+
+## Choose your voice mode setup
+
+There are really three different voice experiences in Hermes.
+
+| Mode | Best for | Platform |
+|---|---|---|
+| Interactive microphone loop | Personal hands-free use while coding or researching | CLI |
+| Voice replies in chat | Spoken responses alongside normal messaging | Telegram, Discord |
+| Live voice channel bot | Group or personal live conversation in a VC | Discord voice channels |
+
+A good path is:
+1. get text working first
+2. enable voice replies second
+3. move to Discord voice channels last if you want the full experience
+
+## Step 1: make sure normal Hermes works first
+
+Before touching voice mode, verify that:
+- Hermes starts
+- your provider is configured
+- the agent can answer text prompts normally
+
+```bash
+hermes
+```
+
+Ask something simple:
+
+```text
+What tools do you have available?
+```
+
+If that is not solid yet, fix text mode first.
+
+## Step 2: install the right extras
+
+### CLI microphone + playback
+
+```bash
+pip install hermes-agent[voice]
+```
+
+### Messaging platforms
+
+```bash
+pip install hermes-agent[messaging]
+```
+
+### Premium ElevenLabs TTS
+
+```bash
+pip install hermes-agent[tts-premium]
+```
+
+### Everything
+
+```bash
+pip install hermes-agent[all]
+```
+
+## Step 3: install system dependencies
+
+### macOS
+
+```bash
+brew install portaudio ffmpeg opus
+```
+
+### Ubuntu / Debian
+
+```bash
+sudo apt install portaudio19-dev ffmpeg libopus0
+```
+
+Why these matter:
+- `portaudio` → microphone input / playback for CLI voice mode
+- `ffmpeg` → audio conversion for TTS and messaging delivery
+- `opus` → Discord voice codec support
+
+## Step 4: choose STT and TTS providers
+
+Hermes supports both local and cloud speech stacks.
+
+### Easiest / cheapest setup
+
+Use local STT and free Edge TTS:
+- STT provider: `local`
+- TTS provider: `edge`
+
+This is usually the best place to start.
+
+### Environment file example
+
+Add to `~/.hermes/.env`:
+
+```bash
+# Cloud STT options (local needs no key)
+GROQ_API_KEY=***
+VOICE_TOOLS_OPENAI_KEY=***
+
+# Premium TTS (optional)
+ELEVENLABS_API_KEY=***
+```
+
+### Provider recommendations
+
+#### Speech-to-text
+
+- `local` → best default for privacy and zero-cost use
+- `groq` → very fast cloud transcription
+- `openai` → good paid fallback
+
+#### Text-to-speech
+
+- `edge` → free and good enough for most users
+- `elevenlabs` → best quality
+- `openai` → good middle ground
+
+## Step 5: recommended config
+
+```yaml
+voice:
+  record_key: "ctrl+b"
+  max_recording_seconds: 120
+  auto_tts: false
+  silence_threshold: 200
+  silence_duration: 3.0
+
+stt:
+  provider: "local"
+  local:
+    model: "base"
+
+tts:
+  provider: "edge"
+  edge:
+    voice: "en-US-AriaNeural"
+```
+
+This is a good conservative default for most people.
+
+## Use case 1: CLI voice mode
+
+## Turn it on
+
+Start Hermes:
+
+```bash
+hermes
+```
+
+Inside the CLI:
+
+```text
+/voice on
+```
+
+### Recording flow
+
+Default key:
+- `Ctrl+B`
+
+Workflow:
+1. press `Ctrl+B`
+2. speak
+3. wait for silence detection to stop recording automatically
+4. Hermes transcribes and responds
+5. if TTS is on, it speaks the answer
+6. the loop can automatically restart for continuous use
+
+### Useful commands
+
+```text
+/voice
+/voice on
+/voice off
+/voice tts
+/voice status
+```
+
+### Good CLI workflows
+
+#### Walk-up debugging
+
+Say:
+
+```text
+I keep getting a docker permission error. Help me debug it.
+```
+
+Then continue hands-free:
+- "Read the last error again"
+- "Explain the root cause in simpler terms"
+- "Now give me the exact fix"
+
+#### Research / brainstorming
+
+Great for:
+- walking around while thinking
+- dictating half-formed ideas
+- asking Hermes to structure your thoughts in real time
+
+#### Accessibility / low-typing sessions
+
+If typing is inconvenient, voice mode is one of the fastest ways to stay in the full Hermes loop.
+
+## Tuning CLI behavior
+
+### Silence threshold
+
+If Hermes starts/stops too aggressively, tune:
+
+```yaml
+voice:
+  silence_threshold: 250
+```
+
+Higher threshold = less sensitive.
+
+### Silence duration
+
+If you pause a lot between sentences, increase:
+
+```yaml
+voice:
+  silence_duration: 4.0
+```
+
+### Record key
+
+If `Ctrl+B` conflicts with your terminal or tmux habits:
+
+```yaml
+voice:
+  record_key: "ctrl+space"
+```
+
+## Use case 2: voice replies in Telegram or Discord
+
+This mode is simpler than full voice channels.
+
+Hermes stays a normal chat bot, but can speak replies.
+
+### Start the gateway
+
+```bash
+hermes gateway
+```
+
+### Turn on voice replies
+
+Inside Telegram or Discord:
+
+```text
+/voice on
+```
+
+or
+
+```text
+/voice tts
+```
+
+### Modes
+
+| Mode | Meaning |
+|---|---|
+| `off` | text only |
+| `voice_only` | speak only when the user sent voice |
+| `all` | speak every reply |
+
+### When to use which mode
+
+- `/voice on` if you want spoken replies only for voice-originating messages
+- `/voice tts` if you want a full spoken assistant all the time
+
+### Good messaging workflows
+
+#### Telegram assistant on your phone
+
+Use when:
+- you are away from your machine
+- you want to send voice notes and get quick spoken replies
+- you want Hermes to function like a portable research or ops assistant
+
+#### Discord DMs with spoken output
+
+Useful when you want private interaction without server-channel mention behavior.
+
+## Use case 3: Discord voice channels
+
+This is the most advanced mode.
+
+Hermes joins a Discord VC, listens to user speech, transcribes it, runs the normal agent pipeline, and speaks replies back into the channel.
+
+## Required Discord permissions
+
+In addition to the normal text-bot setup, make sure the bot has:
+- Connect
+- Speak
+- preferably Use Voice Activity
+
+Also enable privileged intents in the Developer Portal:
+- Presence Intent
+- Server Members Intent
+- Message Content Intent
+
+## Join and leave
+
+In a Discord text channel where the bot is present:
+
+```text
+/voice join
+/voice leave
+/voice status
+```
+
+### What happens when joined
+
+- users speak in the VC
+- Hermes detects speech boundaries
+- transcripts are posted in the associated text channel
+- Hermes responds in text and audio
+- the text channel is the one where `/voice join` was issued
+
+### Best practices for Discord VC use
+
+- keep `DISCORD_ALLOWED_USERS` tight
+- use a dedicated bot/testing channel at first
+- verify STT and TTS work in ordinary text-chat voice mode before trying VC mode
+
+## Voice quality recommendations
+
+### Best quality setup
+
+- STT: local `large-v3` or Groq `whisper-large-v3`
+- TTS: ElevenLabs
+
+### Best speed / convenience setup
+
+- STT: local `base` or Groq
+- TTS: Edge
+
+### Best zero-cost setup
+
+- STT: local
+- TTS: Edge
+
+## Common failure modes
+
+### "No audio device found"
+
+Install `portaudio`.
+
+### "Bot joins but hears nothing"
+
+Check:
+- your Discord user ID is in `DISCORD_ALLOWED_USERS`
+- you are not muted
+- privileged intents are enabled
+- the bot has Connect/Speak permissions
+
+### "It transcribes but does not speak"
+
+Check:
+- TTS provider config
+- API key / quota for ElevenLabs or OpenAI
+- `ffmpeg` install for Edge conversion paths
+
+### "Whisper outputs garbage"
+
+Try:
+- quieter environment
+- higher `silence_threshold`
+- different STT provider/model
+- shorter, clearer utterances
+
+### "It works in DMs but not in server channels"
+
+That is often mention policy.
+
+By default, the bot needs an `@mention` in Discord server text channels unless configured otherwise.
+
+## Suggested first-week setup
+
+If you want the shortest path to success:
+
+1. get text Hermes working
+2. install `hermes-agent[voice]`
+3. use CLI voice mode with local STT + Edge TTS
+4. then enable `/voice on` in Telegram or Discord
+5. only after that, try Discord VC mode
+
+That progression keeps the debugging surface small.
+
+## Where to read next
+
+- [Voice Mode feature reference](/docs/user-guide/features/voice-mode)
+- [Messaging Gateway](/docs/user-guide/messaging)
+- [Discord setup](/docs/user-guide/messaging/discord)
+- [Telegram setup](/docs/user-guide/messaging/telegram)
+- [Configuration](/docs/user-guide/configuration)
--- a/website/docs/index.md
+++ b/website/docs/index.md
@ -33,6 +33,8 @@ It's not a coding copilot tethered to an IDE or a chatbot wrapper around a singl
 | 📚 **[Skills System](/docs/user-guide/features/skills)** | Procedural memory the agent creates and reuses |
 | 🔌 **[MCP Integration](/docs/user-guide/features/mcp)** | Connect to MCP servers, filter their tools, and extend Hermes safely |
 | 🧭 **[Use MCP with Hermes](/docs/guides/use-mcp-with-hermes)** | Practical MCP setup patterns, examples, and tutorials |
+| 🎙️ **[Voice Mode](/docs/user-guide/features/voice-mode)** | Real-time voice interaction in CLI, Telegram, Discord, and Discord VC |
+| 🗣️ **[Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)** | Hands-on setup and usage patterns for Hermes voice workflows |
 | 🎭 **[Personality & SOUL.md](/docs/user-guide/features/personality)** | Define Hermes' default voice with a global SOUL.md |
 | 📄 **[Context Files](/docs/user-guide/features/context-files)** | Project context files that shape every conversation |
 | 🔒 **[Security](/docs/user-guide/security)** | Command approval, authorization, container isolation |
--- a/website/docs/reference/environment-variables.md
+++ b/website/docs/reference/environment-variables.md
@ -31,7 +31,7 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
 | `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code setup-token (same as `ANTHROPIC_TOKEN`) |
 | `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
 | `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
-| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for TTS and voice transcription (separate from custom endpoint) |
+| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for OpenAI speech-to-text and text-to-speech providers |
 | `HERMES_HOME` | Override Hermes config directory (default: `~/.hermes`) |

 ## Provider Auth (OAuth)
@ -57,7 +57,12 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
 | `BROWSERBASE_PROJECT_ID` | Browserbase project ID |
 | `BROWSER_INACTIVITY_TIMEOUT` | Browser session inactivity timeout in seconds |
 | `FAL_KEY` | Image generation ([fal.ai](https://fal.ai/)) |
-| `ELEVENLABS_API_KEY` | Premium TTS voices ([elevenlabs.io](https://elevenlabs.io/)) |
+| `GROQ_API_KEY` | Groq Whisper STT API key ([groq.com](https://groq.com/)) |
+| `ELEVENLABS_API_KEY` | ElevenLabs premium TTS voices ([elevenlabs.io](https://elevenlabs.io/)) |
+| `STT_GROQ_MODEL` | Override the Groq STT model (default: `whisper-large-v3-turbo`) |
+| `GROQ_BASE_URL` | Override the Groq OpenAI-compatible STT endpoint |
+| `STT_OPENAI_MODEL` | Override the OpenAI STT model (default: `whisper-1`) |
+| `STT_OPENAI_BASE_URL` | Override the OpenAI-compatible STT endpoint |
 | `HONCHO_API_KEY` | Cross-session user modeling ([honcho.dev](https://honcho.dev/)) |
 | `TINKER_API_KEY` | RL training ([tinker-console.thinkingmachines.ai](https://tinker-console.thinkingmachines.ai/)) |
 | `WANDB_API_KEY` | RL training metrics ([wandb.ai](https://wandb.ai/)) |
--- a/website/docs/reference/slash-commands.md
+++ b/website/docs/reference/slash-commands.md
@ -45,6 +45,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
 | `/verbose` | Cycle tool progress display: off → new → all → verbose |
 | `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) |
 | `/skin` | Show or change the display skin/theme |
+| `/voice [on\|off\|tts\|status]` | Toggle CLI voice mode and spoken playback. Recording uses `voice.record_key` (default: `Ctrl+B`). |

 ### Tools & Skills

@ -105,6 +106,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
 | `/usage` | Show token usage for the current session. |
 | `/insights [days]` | Show usage analytics. |
 | `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display. |
+| `/voice [on\|off\|tts\|join\|channel\|leave\|status]` | Control spoken replies in chat. `join`/`channel`/`leave` manage Discord voice-channel mode. |
 | `/rollback [number]` | List or restore filesystem checkpoints. |
 | `/background &lt;prompt&gt;` | Run a prompt in a separate background session. |
 | `/reload-mcp` | Reload MCP servers from config. |
@ -116,4 +118,5 @@ The messaging gateway supports the following built-in commands inside Telegram,

 - `/skin`, `/tools`, `/toolsets`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, and `/verbose` are **CLI-only** commands.
 - `/status`, `/stop`, `/sethome`, `/resume`, `/background`, and `/update` are **messaging-only** commands.
- `/reload-mcp` and `/rollback` work in **both** the CLI and the messaging gateway.
+- `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway.
+- `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord.
--- a/website/docs/user-guide/cli.md
+++ b/website/docs/user-guide/cli.md
@ -77,6 +77,7 @@ When resuming a previous session (`hermes -c` or `hermes --resume <id>`), a "Pre
 | `Alt+Enter` or `Ctrl+J` | New line (multi-line input) |
 | `Alt+V` | Paste an image from the clipboard when supported by the terminal |
 | `Ctrl+V` | Paste text and opportunistically attach clipboard images |
+| `Ctrl+B` | Start/stop voice recording when voice mode is enabled (`voice.record_key`, default: `ctrl+b`) |
 | `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) |
 | `Ctrl+D` | Exit |
 | `Tab` | Autocomplete slash commands |
@ -95,11 +96,15 @@ Common examples:
 | `/skills browse` | Browse the skills hub and official optional skills |
 | `/background <prompt>` | Run a prompt in a separate background session |
 | `/skin` | Show or switch the active CLI skin |
+| `/voice on` | Enable CLI voice mode (press `Ctrl+B` to record) |
+| `/voice tts` | Toggle spoken playback for Hermes replies |
 | `/reasoning high` | Increase reasoning effort |
 | `/title My Session` | Name the current session |

 For the full built-in CLI and messaging lists, see [Slash Commands Reference](../reference/slash-commands.md).

+For setup, providers, silence tuning, and messaging/Discord voice usage, see [Voice Mode](features/voice-mode.md).
+
 :::tip
 Commands are case-insensitive — `/HELP` works the same as `/help`. Installed skills also become slash commands automatically.
 :::
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@ -695,6 +695,8 @@ tts:
    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer
 ```

+This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).
+
 ## Display Settings

 ```yaml
@ -719,10 +721,43 @@ display:

 ```yaml
 stt:
-  provider: "openai"           # STT provider
+  provider: "local"            # "local" | "groq" | "openai"
+  local:
+    model: "base"              # tiny, base, small, medium, large-v3
+  openai:
+    model: "whisper-1"         # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
+  # model: "whisper-1"         # Legacy fallback key still respected
 ```

-Requires `VOICE_TOOLS_OPENAI_KEY` in `.env` for OpenAI STT.
+Provider behavior:
+
+- `local` uses `faster-whisper` running on your machine. Install it separately with `pip install faster-whisper`.
+- `groq` uses Groq's Whisper-compatible endpoint and reads `GROQ_API_KEY`.
+- `openai` uses the OpenAI speech API and reads `VOICE_TOOLS_OPENAI_KEY`.
+
+If the requested provider is unavailable, Hermes falls back automatically in this order: `local` → `groq` → `openai`.
+
+Groq and OpenAI model overrides are environment-driven:
+
+```bash
+STT_GROQ_MODEL=whisper-large-v3-turbo
+STT_OPENAI_MODEL=whisper-1
+GROQ_BASE_URL=https://api.groq.com/openai/v1
+STT_OPENAI_BASE_URL=https://api.openai.com/v1
+```
+
+## Voice Mode (CLI)
+
+```yaml
+voice:
+  record_key: "ctrl+b"         # Push-to-talk key inside the CLI
+  max_recording_seconds: 120    # Hard stop for long recordings
+  auto_tts: false               # Enable spoken replies automatically when /voice on
+  silence_threshold: 200        # RMS threshold for speech detection
+  silence_duration: 3.0         # Seconds of silence before auto-stop
+```
+
+Use `/voice on` in the CLI to enable microphone mode, `record_key` to start/stop recording, and `/voice tts` to toggle spoken replies. See [Voice Mode](/docs/user-guide/features/voice-mode) for end-to-end setup and platform-specific behavior.

 ## Quick Commands

--- a/website/docs/user-guide/features/cron.md
+++ b/website/docs/user-guide/features/cron.md
@ -194,6 +194,8 @@ The agent's final response is automatically delivered. You do not need to call `

 ## Schedule formats

+The agent's final response is automatically delivered — you do **not** need to include `send_message` in the cron prompt for that same destination. If a cron run calls `send_message` to the exact target the scheduler will already deliver to, Hermes skips that duplicate send and tells the model to put the user-facing content in the final response instead. Use `send_message` only for additional or different targets.
+
 ### Relative delays (one-shot)

 ```text
--- a/website/docs/user-guide/features/voice-mode.md
+++ b/website/docs/user-guide/features/voice-mode.md
@ -8,12 +8,14 @@ description: "Real-time voice conversations with Hermes Agent — CLI, Telegram,

 Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.

+If you want a practical setup walkthrough with recommended configurations and real usage patterns, see [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
+
 ## Prerequisites

 Before using voice features, make sure you have:

-1. **Hermes Agent installed** — `pip install hermes-agent` (see [Getting Started](../../getting-started.md))
-2. **An LLM provider configured** — set `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and `LLM_MODEL` in `~/.hermes/.env`
+1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
+2. **An LLM provider configured** — run `hermes model` or set your preferred provider credentials in `~/.hermes/.env`
 3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice

 :::tip
--- a/website/docs/user-guide/messaging/discord.md
+++ b/website/docs/user-guide/messaging/discord.md
@ -210,8 +210,13 @@ Replace the ID with the actual channel ID (right-click → Copy Channel ID with

 Hermes Agent supports Discord voice messages:

- **Incoming voice messages** are automatically transcribed using Whisper (requires `GROQ_API_KEY` or `VOICE_TOOLS_OPENAI_KEY` to be set in your environment).
+- **Incoming voice messages** are automatically transcribed using the configured STT provider: local `faster-whisper` (no key), Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`).
 - **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies.
+- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel.
+
+For the full setup and operational guide, see:
+- [Voice Mode](/docs/user-guide/features/voice-mode)
+- [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)

 ## Troubleshooting

--- a/website/docs/user-guide/messaging/index.md
+++ b/website/docs/user-guide/messaging/index.md
@ -8,6 +8,8 @@ description: "Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal,

 Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant, or your browser. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages.

+For the full voice feature set — including CLI microphone mode, spoken replies in messaging, and Discord voice-channel conversations — see [Voice Mode](/docs/user-guide/features/voice-mode) and [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
+
 ## Architecture

 ```text
@ -77,6 +79,7 @@ hermes gateway status       # Check service status
 | `/usage` | Show token usage for this session |
 | `/insights [days]` | Show usage insights and analytics |
 | `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display |
+| `/voice [on\|off\|tts\|join\|leave\|status]` | Control messaging voice replies and Discord voice-channel behavior |
 | `/rollback [number]` | List or restore filesystem checkpoints |
 | `/background <prompt>` | Run a prompt in a separate background session |
 | `/reload-mcp` | Reload MCP servers from config |
--- a/website/docs/user-guide/messaging/slack.md
+++ b/website/docs/user-guide/messaging/slack.md
@ -224,7 +224,7 @@ Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`).

 Hermes supports voice on Slack:

- **Incoming:** Voice/audio messages are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
+- **Incoming:** Voice/audio messages are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
 - **Outgoing:** TTS responses are sent as audio file attachments

 ---
--- a/website/docs/user-guide/messaging/telegram.md
+++ b/website/docs/user-guide/messaging/telegram.md
@ -131,7 +131,11 @@ Group chat IDs are negative numbers (e.g., `-1001234567890`). Your personal DM c

 ### Incoming Voice (Speech-to-Text)

-Voice messages you send on Telegram are automatically transcribed using OpenAI's Whisper API and injected as text into the conversation. This requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.
+Voice messages you send on Telegram are automatically transcribed by Hermes's configured STT provider and injected as text into the conversation.
+
+- `local` uses `faster-whisper` on the machine running Hermes — no API key required
+- `groq` uses Groq Whisper and requires `GROQ_API_KEY`
+- `openai` uses OpenAI Whisper and requires `VOICE_TOOLS_OPENAI_KEY`

 ### Outgoing Voice (Text-to-Speech)

@ -173,7 +177,7 @@ Hermes Agent works in Telegram group chats with a few considerations:
 | Bot not responding at all | Verify `TELEGRAM_BOT_TOKEN` is correct. Check `hermes gateway` logs for errors. |
 | Bot responds with "unauthorized" | Your user ID is not in `TELEGRAM_ALLOWED_USERS`. Double-check with @userinfobot. |
 | Bot ignores group messages | Privacy mode is likely on. Disable it (Step 3) or make the bot a group admin. **Remember to remove and re-add the bot after changing privacy.** |
-| Voice messages not transcribed | Check that `VOICE_TOOLS_OPENAI_KEY` is set and valid in `~/.hermes/.env`. |
+| Voice messages not transcribed | Verify STT is available: install `faster-whisper` for local transcription, or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`. |
 | Voice replies are files, not bubbles | Install `ffmpeg` (needed for Edge TTS Opus conversion). |
 | Bot token revoked/invalid | Generate a new token via `/revoke` then `/newbot` or `/token` in BotFather. Update your `.env` file. |

--- a/website/docs/user-guide/messaging/whatsapp.md
+++ b/website/docs/user-guide/messaging/whatsapp.md
@ -137,7 +137,7 @@ with reconnection logic.

 Hermes supports voice on WhatsApp:

- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
+- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
 - **Outgoing:** TTS responses are sent as MP3 audio file attachments
 - Agent responses are prefixed with "⚕ **Hermes Agent**" for easy identification