mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-01 01:51:44 +00:00
Merge origin/main into hermes/hermes-5d160594
This commit is contained in:
commit
3229e434b8
78 changed files with 3762 additions and 395 deletions
424
website/docs/developer-guide/adding-providers.md
Normal file
424
website/docs/developer-guide/adding-providers.md
Normal file
|
|
@ -0,0 +1,424 @@
|
|||
---
|
||||
sidebar_position: 5
|
||||
title: "Adding Providers"
|
||||
description: "How to add a new inference provider to Hermes Agent — auth, runtime resolution, CLI flows, adapters, tests, and docs"
|
||||
---
|
||||
|
||||
# Adding Providers
|
||||
|
||||
Hermes can already talk to any OpenAI-compatible endpoint through the custom provider path. Do not add a built-in provider unless you want first-class UX for that service:
|
||||
|
||||
- provider-specific auth or token refresh
|
||||
- a curated model catalog
|
||||
- setup / `hermes model` menu entries
|
||||
- provider aliases for `provider:model` syntax
|
||||
- a non-OpenAI API shape that needs an adapter
|
||||
|
||||
If the provider is just "another OpenAI-compatible base URL and API key", a named custom provider may be enough.
|
||||
|
||||
## The mental model
|
||||
|
||||
A built-in provider has to line up across a few layers:
|
||||
|
||||
1. `hermes_cli/auth.py` decides how credentials are found.
|
||||
2. `hermes_cli/runtime_provider.py` turns that into runtime data:
|
||||
- `provider`
|
||||
- `api_mode`
|
||||
- `base_url`
|
||||
- `api_key`
|
||||
- `source`
|
||||
3. `run_agent.py` uses `api_mode` to decide how requests are built and sent.
|
||||
4. `hermes_cli/models.py`, `hermes_cli/main.py`, and `hermes_cli/setup.py` make the provider show up in the CLI.
|
||||
5. `agent/auxiliary_client.py` and `agent/model_metadata.py` keep side tasks and token budgeting working.
|
||||
|
||||
The important abstraction is `api_mode`.
|
||||
|
||||
- Most providers use `chat_completions`.
|
||||
- Codex uses `codex_responses`.
|
||||
- Anthropic uses `anthropic_messages`.
|
||||
- A new non-OpenAI protocol usually means adding a new adapter and a new `api_mode` branch.
|
||||
|
||||
## Choose the implementation path first
|
||||
|
||||
### Path A — OpenAI-compatible provider
|
||||
|
||||
Use this when the provider accepts standard chat-completions style requests.
|
||||
|
||||
Typical work:
|
||||
|
||||
- add auth metadata
|
||||
- add model catalog / aliases
|
||||
- add runtime resolution
|
||||
- add CLI menu wiring
|
||||
- add aux-model defaults
|
||||
- add tests and user docs
|
||||
|
||||
You usually do not need a new adapter or a new `api_mode`.
|
||||
|
||||
### Path B — Native provider
|
||||
|
||||
Use this when the provider does not behave like OpenAI chat completions.
|
||||
|
||||
Examples in-tree today:
|
||||
|
||||
- `codex_responses`
|
||||
- `anthropic_messages`
|
||||
|
||||
This path includes everything from Path A plus:
|
||||
|
||||
- a provider adapter in `agent/`
|
||||
- `run_agent.py` branches for request building, dispatch, usage extraction, interrupt handling, and response normalization
|
||||
- adapter tests
|
||||
|
||||
## File checklist
|
||||
|
||||
### Required for every built-in provider
|
||||
|
||||
1. `hermes_cli/auth.py`
|
||||
2. `hermes_cli/models.py`
|
||||
3. `hermes_cli/runtime_provider.py`
|
||||
4. `hermes_cli/main.py`
|
||||
5. `hermes_cli/setup.py`
|
||||
6. `agent/auxiliary_client.py`
|
||||
7. `agent/model_metadata.py`
|
||||
8. tests
|
||||
9. user-facing docs under `website/docs/`
|
||||
|
||||
### Additional for native / non-OpenAI providers
|
||||
|
||||
10. `agent/<provider>_adapter.py`
|
||||
11. `run_agent.py`
|
||||
12. `pyproject.toml` if a provider SDK is required
|
||||
|
||||
## Step 1: Pick one canonical provider id
|
||||
|
||||
Choose a single provider id and use it everywhere.
|
||||
|
||||
Examples from the repo:
|
||||
|
||||
- `openai-codex`
|
||||
- `kimi-coding`
|
||||
- `minimax-cn`
|
||||
|
||||
That same id should appear in:
|
||||
|
||||
- `PROVIDER_REGISTRY` in `hermes_cli/auth.py`
|
||||
- `_PROVIDER_LABELS` in `hermes_cli/models.py`
|
||||
- `_PROVIDER_ALIASES` in both `hermes_cli/auth.py` and `hermes_cli/models.py`
|
||||
- CLI `--provider` choices in `hermes_cli/main.py`
|
||||
- setup / model selection branches
|
||||
- auxiliary-model defaults
|
||||
- tests
|
||||
|
||||
If the id differs between those files, the provider will feel half-wired: auth may work while `/model`, setup, or runtime resolution silently misses it.
|
||||
|
||||
## Step 2: Add auth metadata in `hermes_cli/auth.py`
|
||||
|
||||
For API-key providers, add a `ProviderConfig` entry to `PROVIDER_REGISTRY` with:
|
||||
|
||||
- `id`
|
||||
- `name`
|
||||
- `auth_type="api_key"`
|
||||
- `inference_base_url`
|
||||
- `api_key_env_vars`
|
||||
- optional `base_url_env_var`
|
||||
|
||||
Also add aliases to `_PROVIDER_ALIASES`.
|
||||
|
||||
Use the existing providers as templates:
|
||||
|
||||
- simple API-key path: Z.AI, MiniMax
|
||||
- API-key path with endpoint detection: Kimi, Z.AI
|
||||
- native token resolution: Anthropic
|
||||
- OAuth / auth-store path: Nous, OpenAI Codex
|
||||
|
||||
Questions to answer here:
|
||||
|
||||
- What env vars should Hermes check, and in what priority order?
|
||||
- Does the provider need base-URL overrides?
|
||||
- Does it need endpoint probing or token refresh?
|
||||
- What should the auth error say when credentials are missing?
|
||||
|
||||
If the provider needs something more than "look up an API key", add a dedicated credential resolver instead of shoving logic into unrelated branches.
|
||||
|
||||
## Step 3: Add model catalog and aliases in `hermes_cli/models.py`
|
||||
|
||||
Update the provider catalog so the provider works in menus and in `provider:model` syntax.
|
||||
|
||||
Typical edits:
|
||||
|
||||
- `_PROVIDER_MODELS`
|
||||
- `_PROVIDER_LABELS`
|
||||
- `_PROVIDER_ALIASES`
|
||||
- provider display order inside `list_available_providers()`
|
||||
- `provider_model_ids()` if the provider supports a live `/models` fetch
|
||||
|
||||
If the provider exposes a live model list, prefer that first and keep `_PROVIDER_MODELS` as the static fallback.
|
||||
|
||||
This file is also what makes inputs like these work:
|
||||
|
||||
```text
|
||||
anthropic:claude-sonnet-4-6
|
||||
kimi:model-name
|
||||
```
|
||||
|
||||
If aliases are missing here, the provider may authenticate correctly but still fail in `/model` parsing.
|
||||
|
||||
## Step 4: Resolve runtime data in `hermes_cli/runtime_provider.py`
|
||||
|
||||
`resolve_runtime_provider()` is the shared path used by CLI, gateway, cron, ACP, and helper clients.
|
||||
|
||||
Add a branch that returns a dict with at least:
|
||||
|
||||
```python
|
||||
{
|
||||
"provider": "your-provider",
|
||||
"api_mode": "chat_completions", # or your native mode
|
||||
"base_url": "https://...",
|
||||
"api_key": "...",
|
||||
"source": "env|portal|auth-store|explicit",
|
||||
"requested_provider": requested_provider,
|
||||
}
|
||||
```
|
||||
|
||||
If the provider is OpenAI-compatible, `api_mode` should usually stay `chat_completions`.
|
||||
|
||||
Be careful with API-key precedence. Hermes already contains logic to avoid leaking an OpenRouter key to unrelated endpoints. A new provider should be equally explicit about which key goes to which base URL.
|
||||
|
||||
## Step 5: Wire the CLI in `hermes_cli/main.py` and `hermes_cli/setup.py`
|
||||
|
||||
A provider is not discoverable until it shows up in the interactive flows.
|
||||
|
||||
Update:
|
||||
|
||||
### `hermes_cli/main.py`
|
||||
|
||||
- `provider_labels`
|
||||
- provider dispatch inside the `model` command
|
||||
- `--provider` argument choices
|
||||
- login/logout choices if the provider supports those flows
|
||||
- a `_model_flow_<provider>()` function, or reuse `_model_flow_api_key_provider()` if it fits
|
||||
|
||||
### `hermes_cli/setup.py`
|
||||
|
||||
- `provider_choices`
|
||||
- auth branch for the provider
|
||||
- model-selection branch
|
||||
- any provider-specific explanatory text
|
||||
- any place where a provider should be excluded from OpenRouter-only prompts or routing settings
|
||||
|
||||
If you only update one of these files, `hermes model` and `hermes setup` will drift.
|
||||
|
||||
## Step 6: Keep auxiliary calls working
|
||||
|
||||
Two files matter here:
|
||||
|
||||
### `agent/auxiliary_client.py`
|
||||
|
||||
Add a cheap / fast default aux model to `_API_KEY_PROVIDER_AUX_MODELS` if this is a direct API-key provider.
|
||||
|
||||
Auxiliary tasks include things like:
|
||||
|
||||
- vision summarization
|
||||
- web extraction summarization
|
||||
- context compression summaries
|
||||
- session-search summaries
|
||||
- memory flushes
|
||||
|
||||
If the provider has no sensible aux default, side tasks may fall back badly or use an expensive main model unexpectedly.
|
||||
|
||||
### `agent/model_metadata.py`
|
||||
|
||||
Add context lengths for the provider's models so token budgeting, compression thresholds, and limits stay sane.
|
||||
|
||||
## Step 7: If the provider is native, add an adapter and `run_agent.py` support
|
||||
|
||||
If the provider is not plain chat completions, isolate the provider-specific logic in `agent/<provider>_adapter.py`.
|
||||
|
||||
Keep `run_agent.py` focused on orchestration. It should call adapter helpers, not hand-build provider payloads inline all over the file.
|
||||
|
||||
A native provider usually needs work in these places:
|
||||
|
||||
### New adapter file
|
||||
|
||||
Typical responsibilities:
|
||||
|
||||
- build the SDK / HTTP client
|
||||
- resolve tokens
|
||||
- convert OpenAI-style conversation messages to the provider's request format
|
||||
- convert tool schemas if needed
|
||||
- normalize provider responses back into what `run_agent.py` expects
|
||||
- extract usage and finish-reason data
|
||||
|
||||
### `run_agent.py`
|
||||
|
||||
Search for `api_mode` and audit every switch point. At minimum, verify:
|
||||
|
||||
- `__init__` chooses the new `api_mode`
|
||||
- client construction works for the provider
|
||||
- `_build_api_kwargs()` knows how to format requests
|
||||
- `_api_call_with_interrupt()` dispatches to the right client call
|
||||
- interrupt / client rebuild paths work
|
||||
- response validation accepts the provider's shape
|
||||
- finish-reason extraction is correct
|
||||
- token-usage extraction is correct
|
||||
- fallback-model activation can switch into the new provider cleanly
|
||||
- summary-generation and memory-flush paths still work
|
||||
|
||||
Also search `run_agent.py` for `self.client.`. Any code path that assumes the standard OpenAI client exists can break when a native provider uses a different client object or `self.client = None`.
|
||||
|
||||
### Prompt caching and provider-specific request fields
|
||||
|
||||
Prompt caching and provider-specific knobs are easy to regress.
|
||||
|
||||
Examples already in-tree:
|
||||
|
||||
- Anthropic has a native prompt-caching path
|
||||
- OpenRouter gets provider-routing fields
|
||||
- not every provider should receive every request-side option
|
||||
|
||||
When you add a native provider, double-check that Hermes is only sending fields that provider actually understands.
|
||||
|
||||
## Step 8: Tests
|
||||
|
||||
At minimum, touch the tests that guard provider wiring.
|
||||
|
||||
Common places:
|
||||
|
||||
- `tests/test_runtime_provider_resolution.py`
|
||||
- `tests/test_cli_provider_resolution.py`
|
||||
- `tests/test_cli_model_command.py`
|
||||
- `tests/test_setup_model_selection.py`
|
||||
- `tests/test_provider_parity.py`
|
||||
- `tests/test_run_agent.py`
|
||||
- `tests/test_<provider>_adapter.py` for a native provider
|
||||
|
||||
For docs-only examples, the exact file set may differ. The point is to cover:
|
||||
|
||||
- auth resolution
|
||||
- CLI menu / provider selection
|
||||
- runtime provider resolution
|
||||
- agent execution path
|
||||
- provider:model parsing
|
||||
- any adapter-specific message conversion
|
||||
|
||||
Run tests with xdist disabled:
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
python -m pytest tests/test_runtime_provider_resolution.py tests/test_cli_provider_resolution.py tests/test_cli_model_command.py tests/test_setup_model_selection.py -n0 -q
|
||||
```
|
||||
|
||||
For deeper changes, run the full suite before pushing:
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
python -m pytest tests/ -n0 -q
|
||||
```
|
||||
|
||||
## Step 9: Live verification
|
||||
|
||||
After tests, run a real smoke test.
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
python -m hermes_cli.main chat -q "Say hello" --provider your-provider --model your-model
|
||||
```
|
||||
|
||||
Also test the interactive flows if you changed menus:
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
python -m hermes_cli.main model
|
||||
python -m hermes_cli.main setup
|
||||
```
|
||||
|
||||
For native providers, verify at least one tool call too, not just a plain text response.
|
||||
|
||||
## Step 10: Update user-facing docs
|
||||
|
||||
If the provider is meant to ship as a first-class option, update the user docs too:
|
||||
|
||||
- `website/docs/getting-started/quickstart.md`
|
||||
- `website/docs/user-guide/configuration.md`
|
||||
- `website/docs/reference/environment-variables.md`
|
||||
|
||||
A developer can wire the provider perfectly and still leave users unable to discover the required env vars or setup flow.
|
||||
|
||||
## OpenAI-compatible provider checklist
|
||||
|
||||
Use this if the provider is standard chat completions.
|
||||
|
||||
- [ ] `ProviderConfig` added in `hermes_cli/auth.py`
|
||||
- [ ] aliases added in `hermes_cli/auth.py` and `hermes_cli/models.py`
|
||||
- [ ] model catalog added in `hermes_cli/models.py`
|
||||
- [ ] runtime branch added in `hermes_cli/runtime_provider.py`
|
||||
- [ ] CLI wiring added in `hermes_cli/main.py`
|
||||
- [ ] setup wiring added in `hermes_cli/setup.py`
|
||||
- [ ] aux model added in `agent/auxiliary_client.py`
|
||||
- [ ] context lengths added in `agent/model_metadata.py`
|
||||
- [ ] runtime / CLI tests updated
|
||||
- [ ] user docs updated
|
||||
|
||||
## Native provider checklist
|
||||
|
||||
Use this when the provider needs a new protocol path.
|
||||
|
||||
- [ ] everything in the OpenAI-compatible checklist
|
||||
- [ ] adapter added in `agent/<provider>_adapter.py`
|
||||
- [ ] new `api_mode` supported in `run_agent.py`
|
||||
- [ ] interrupt / rebuild path works
|
||||
- [ ] usage and finish-reason extraction works
|
||||
- [ ] fallback path works
|
||||
- [ ] adapter tests added
|
||||
- [ ] live smoke test passes
|
||||
|
||||
## Common pitfalls
|
||||
|
||||
### 1. Adding the provider to auth but not to model parsing
|
||||
|
||||
That makes credentials resolve correctly while `/model` and `provider:model` inputs fail.
|
||||
|
||||
### 2. Forgetting that `config["model"]` can be a string or a dict
|
||||
|
||||
A lot of provider-selection code has to normalize both forms.
|
||||
|
||||
### 3. Assuming a built-in provider is required
|
||||
|
||||
If the service is just OpenAI-compatible, a custom provider may already solve the user problem with less maintenance.
|
||||
|
||||
### 4. Forgetting auxiliary paths
|
||||
|
||||
The main chat path can work while summarization, memory flushes, or vision helpers fail because aux routing was never updated.
|
||||
|
||||
### 5. Native-provider branches hiding in `run_agent.py`
|
||||
|
||||
Search for `api_mode` and `self.client.`. Do not assume the obvious request path is the only one.
|
||||
|
||||
### 6. Sending OpenRouter-only knobs to other providers
|
||||
|
||||
Fields like provider routing belong only on the providers that support them.
|
||||
|
||||
### 7. Updating `hermes model` but not `hermes setup`
|
||||
|
||||
Both flows need to know about the provider.
|
||||
|
||||
## Good search targets while implementing
|
||||
|
||||
If you are hunting for all the places a provider touches, search these symbols:
|
||||
|
||||
- `PROVIDER_REGISTRY`
|
||||
- `_PROVIDER_ALIASES`
|
||||
- `_PROVIDER_MODELS`
|
||||
- `resolve_runtime_provider`
|
||||
- `_model_flow_`
|
||||
- `provider_choices`
|
||||
- `api_mode`
|
||||
- `_API_KEY_PROVIDER_AUX_MODELS`
|
||||
- `self.client.`
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Provider Runtime Resolution](./provider-runtime.md)
|
||||
- [Architecture](./architecture.md)
|
||||
- [Contributing](./contributing.md)
|
||||
|
|
@ -41,12 +41,13 @@ If you are new to the codebase, read in this order:
|
|||
2. [Agent Loop Internals](./agent-loop.md)
|
||||
3. [Prompt Assembly](./prompt-assembly.md)
|
||||
4. [Provider Runtime Resolution](./provider-runtime.md)
|
||||
5. [Tools Runtime](./tools-runtime.md)
|
||||
6. [Session Storage](./session-storage.md)
|
||||
7. [Gateway Internals](./gateway-internals.md)
|
||||
8. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
|
||||
9. [ACP Internals](./acp-internals.md)
|
||||
10. [Environments, Benchmarks & Data Generation](./environments.md)
|
||||
5. [Adding Providers](./adding-providers.md)
|
||||
6. [Tools Runtime](./tools-runtime.md)
|
||||
7. [Session Storage](./session-storage.md)
|
||||
8. [Gateway Internals](./gateway-internals.md)
|
||||
9. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
|
||||
10. [ACP Internals](./acp-internals.md)
|
||||
11. [Environments, Benchmarks & Data Generation](./environments.md)
|
||||
|
||||
## Major subsystems
|
||||
|
||||
|
|
|
|||
|
|
@ -20,6 +20,12 @@ We value contributions in this order:
|
|||
6. **New tools** — rarely needed; most capabilities should be skills
|
||||
7. **Documentation** — fixes, clarifications, new examples
|
||||
|
||||
## Common contribution paths
|
||||
|
||||
- Building a new tool? Start with [Adding Tools](./adding-tools.md)
|
||||
- Building a new skill? Start with [Creating Skills](./creating-skills.md)
|
||||
- Building a new inference provider? Start with [Adding Providers](./adding-providers.md)
|
||||
|
||||
## Development Setup
|
||||
|
||||
### Prerequisites
|
||||
|
|
|
|||
|
|
@ -20,6 +20,8 @@ Primary implementation:
|
|||
- `hermes_cli/auth.py`
|
||||
- `agent/auxiliary_client.py`
|
||||
|
||||
If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) alongside this page.
|
||||
|
||||
## Resolution precedence
|
||||
|
||||
At a high level, provider resolution uses:
|
||||
|
|
|
|||
|
|
@ -119,6 +119,7 @@ uv pip install -e "."
|
|||
| `cli` | Terminal menu UI for setup wizard | `uv pip install -e ".[cli]"` |
|
||||
| `modal` | Modal cloud execution backend | `uv pip install -e ".[modal]"` |
|
||||
| `tts-premium` | ElevenLabs premium voices | `uv pip install -e ".[tts-premium]"` |
|
||||
| `voice` | CLI microphone input + audio playback | `uv pip install -e ".[voice]"` |
|
||||
| `pty` | PTY terminal support | `uv pip install -e ".[pty]"` |
|
||||
| `honcho` | AI-native memory (Honcho integration) | `uv pip install -e ".[honcho]"` |
|
||||
| `mcp` | Model Context Protocol support | `uv pip install -e ".[mcp]"` |
|
||||
|
|
|
|||
|
|
@ -54,7 +54,9 @@ Deploy Hermes Agent as a bot on your favorite messaging platform.
|
|||
3. [Messaging Overview](/docs/user-guide/messaging)
|
||||
4. [Telegram Setup](/docs/user-guide/messaging/telegram)
|
||||
5. [Discord Setup](/docs/user-guide/messaging/discord)
|
||||
6. [Security](/docs/user-guide/security)
|
||||
6. [Voice Mode](/docs/user-guide/features/voice-mode)
|
||||
7. [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)
|
||||
8. [Security](/docs/user-guide/security)
|
||||
|
||||
For full project examples, see:
|
||||
- [Daily Briefing Bot](/docs/guides/daily-briefing-bot)
|
||||
|
|
|
|||
|
|
@ -129,6 +129,25 @@ Chat with Hermes from your phone or other surfaces via Telegram, Discord, Slack,
|
|||
hermes gateway setup # Interactive platform configuration
|
||||
```
|
||||
|
||||
### Add voice mode
|
||||
|
||||
Want microphone input in the CLI or spoken replies in messaging?
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[voice]
|
||||
|
||||
# Optional but recommended for free local speech-to-text
|
||||
pip install faster-whisper
|
||||
```
|
||||
|
||||
Then start Hermes and enable it inside the CLI:
|
||||
|
||||
```text
|
||||
/voice on
|
||||
```
|
||||
|
||||
Press `Ctrl+B` to record, or use `/voice tts` to have Hermes speak its replies. See [Voice Mode](../user-guide/features/voice-mode.md) for the full setup across CLI, Telegram, Discord, and Discord voice channels.
|
||||
|
||||
### Schedule automated tasks
|
||||
|
||||
```
|
||||
|
|
|
|||
422
website/docs/guides/use-voice-mode-with-hermes.md
Normal file
422
website/docs/guides/use-voice-mode-with-hermes.md
Normal file
|
|
@ -0,0 +1,422 @@
|
|||
---
|
||||
sidebar_position: 7
|
||||
title: "Use Voice Mode with Hermes"
|
||||
description: "A practical guide to setting up and using Hermes voice mode across CLI, Telegram, Discord, and Discord voice channels"
|
||||
---
|
||||
|
||||
# Use Voice Mode with Hermes
|
||||
|
||||
This guide is the practical companion to the [Voice Mode feature reference](/docs/user-guide/features/voice-mode).
|
||||
|
||||
If the feature page explains what voice mode can do, this guide shows how to actually use it well.
|
||||
|
||||
## What voice mode is good for
|
||||
|
||||
Voice mode is especially useful when:
|
||||
- you want a hands-free CLI workflow
|
||||
- you want spoken responses in Telegram or Discord
|
||||
- you want Hermes sitting in a Discord voice channel for live conversation
|
||||
- you want quick idea capture, debugging, or back-and-forth while walking around instead of typing
|
||||
|
||||
## Choose your voice mode setup
|
||||
|
||||
There are really three different voice experiences in Hermes.
|
||||
|
||||
| Mode | Best for | Platform |
|
||||
|---|---|---|
|
||||
| Interactive microphone loop | Personal hands-free use while coding or researching | CLI |
|
||||
| Voice replies in chat | Spoken responses alongside normal messaging | Telegram, Discord |
|
||||
| Live voice channel bot | Group or personal live conversation in a VC | Discord voice channels |
|
||||
|
||||
A good path is:
|
||||
1. get text working first
|
||||
2. enable voice replies second
|
||||
3. move to Discord voice channels last if you want the full experience
|
||||
|
||||
## Step 1: make sure normal Hermes works first
|
||||
|
||||
Before touching voice mode, verify that:
|
||||
- Hermes starts
|
||||
- your provider is configured
|
||||
- the agent can answer text prompts normally
|
||||
|
||||
```bash
|
||||
hermes
|
||||
```
|
||||
|
||||
Ask something simple:
|
||||
|
||||
```text
|
||||
What tools do you have available?
|
||||
```
|
||||
|
||||
If that is not solid yet, fix text mode first.
|
||||
|
||||
## Step 2: install the right extras
|
||||
|
||||
### CLI microphone + playback
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[voice]
|
||||
```
|
||||
|
||||
### Messaging platforms
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[messaging]
|
||||
```
|
||||
|
||||
### Premium ElevenLabs TTS
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[tts-premium]
|
||||
```
|
||||
|
||||
### Everything
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[all]
|
||||
```
|
||||
|
||||
## Step 3: install system dependencies
|
||||
|
||||
### macOS
|
||||
|
||||
```bash
|
||||
brew install portaudio ffmpeg opus
|
||||
```
|
||||
|
||||
### Ubuntu / Debian
|
||||
|
||||
```bash
|
||||
sudo apt install portaudio19-dev ffmpeg libopus0
|
||||
```
|
||||
|
||||
Why these matter:
|
||||
- `portaudio` → microphone input / playback for CLI voice mode
|
||||
- `ffmpeg` → audio conversion for TTS and messaging delivery
|
||||
- `opus` → Discord voice codec support
|
||||
|
||||
## Step 4: choose STT and TTS providers
|
||||
|
||||
Hermes supports both local and cloud speech stacks.
|
||||
|
||||
### Easiest / cheapest setup
|
||||
|
||||
Use local STT and free Edge TTS:
|
||||
- STT provider: `local`
|
||||
- TTS provider: `edge`
|
||||
|
||||
This is usually the best place to start.
|
||||
|
||||
### Environment file example
|
||||
|
||||
Add to `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
# Cloud STT options (local needs no key)
|
||||
GROQ_API_KEY=***
|
||||
VOICE_TOOLS_OPENAI_KEY=***
|
||||
|
||||
# Premium TTS (optional)
|
||||
ELEVENLABS_API_KEY=***
|
||||
```
|
||||
|
||||
### Provider recommendations
|
||||
|
||||
#### Speech-to-text
|
||||
|
||||
- `local` → best default for privacy and zero-cost use
|
||||
- `groq` → very fast cloud transcription
|
||||
- `openai` → good paid fallback
|
||||
|
||||
#### Text-to-speech
|
||||
|
||||
- `edge` → free and good enough for most users
|
||||
- `elevenlabs` → best quality
|
||||
- `openai` → good middle ground
|
||||
|
||||
## Step 5: recommended config
|
||||
|
||||
```yaml
|
||||
voice:
|
||||
record_key: "ctrl+b"
|
||||
max_recording_seconds: 120
|
||||
auto_tts: false
|
||||
silence_threshold: 200
|
||||
silence_duration: 3.0
|
||||
|
||||
stt:
|
||||
provider: "local"
|
||||
local:
|
||||
model: "base"
|
||||
|
||||
tts:
|
||||
provider: "edge"
|
||||
edge:
|
||||
voice: "en-US-AriaNeural"
|
||||
```
|
||||
|
||||
This is a good conservative default for most people.
|
||||
|
||||
## Use case 1: CLI voice mode
|
||||
|
||||
## Turn it on
|
||||
|
||||
Start Hermes:
|
||||
|
||||
```bash
|
||||
hermes
|
||||
```
|
||||
|
||||
Inside the CLI:
|
||||
|
||||
```text
|
||||
/voice on
|
||||
```
|
||||
|
||||
### Recording flow
|
||||
|
||||
Default key:
|
||||
- `Ctrl+B`
|
||||
|
||||
Workflow:
|
||||
1. press `Ctrl+B`
|
||||
2. speak
|
||||
3. wait for silence detection to stop recording automatically
|
||||
4. Hermes transcribes and responds
|
||||
5. if TTS is on, it speaks the answer
|
||||
6. the loop can automatically restart for continuous use
|
||||
|
||||
### Useful commands
|
||||
|
||||
```text
|
||||
/voice
|
||||
/voice on
|
||||
/voice off
|
||||
/voice tts
|
||||
/voice status
|
||||
```
|
||||
|
||||
### Good CLI workflows
|
||||
|
||||
#### Walk-up debugging
|
||||
|
||||
Say:
|
||||
|
||||
```text
|
||||
I keep getting a docker permission error. Help me debug it.
|
||||
```
|
||||
|
||||
Then continue hands-free:
|
||||
- "Read the last error again"
|
||||
- "Explain the root cause in simpler terms"
|
||||
- "Now give me the exact fix"
|
||||
|
||||
#### Research / brainstorming
|
||||
|
||||
Great for:
|
||||
- walking around while thinking
|
||||
- dictating half-formed ideas
|
||||
- asking Hermes to structure your thoughts in real time
|
||||
|
||||
#### Accessibility / low-typing sessions
|
||||
|
||||
If typing is inconvenient, voice mode is one of the fastest ways to stay in the full Hermes loop.
|
||||
|
||||
## Tuning CLI behavior
|
||||
|
||||
### Silence threshold
|
||||
|
||||
If Hermes starts/stops too aggressively, tune:
|
||||
|
||||
```yaml
|
||||
voice:
|
||||
silence_threshold: 250
|
||||
```
|
||||
|
||||
Higher threshold = less sensitive.
|
||||
|
||||
### Silence duration
|
||||
|
||||
If you pause a lot between sentences, increase:
|
||||
|
||||
```yaml
|
||||
voice:
|
||||
silence_duration: 4.0
|
||||
```
|
||||
|
||||
### Record key
|
||||
|
||||
If `Ctrl+B` conflicts with your terminal or tmux habits:
|
||||
|
||||
```yaml
|
||||
voice:
|
||||
record_key: "ctrl+space"
|
||||
```
|
||||
|
||||
## Use case 2: voice replies in Telegram or Discord
|
||||
|
||||
This mode is simpler than full voice channels.
|
||||
|
||||
Hermes stays a normal chat bot, but can speak replies.
|
||||
|
||||
### Start the gateway
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
### Turn on voice replies
|
||||
|
||||
Inside Telegram or Discord:
|
||||
|
||||
```text
|
||||
/voice on
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```text
|
||||
/voice tts
|
||||
```
|
||||
|
||||
### Modes
|
||||
|
||||
| Mode | Meaning |
|
||||
|---|---|
|
||||
| `off` | text only |
|
||||
| `voice_only` | speak only when the user sent voice |
|
||||
| `all` | speak every reply |
|
||||
|
||||
### When to use which mode
|
||||
|
||||
- `/voice on` if you want spoken replies only for voice-originating messages
|
||||
- `/voice tts` if you want a full spoken assistant all the time
|
||||
|
||||
### Good messaging workflows
|
||||
|
||||
#### Telegram assistant on your phone
|
||||
|
||||
Use when:
|
||||
- you are away from your machine
|
||||
- you want to send voice notes and get quick spoken replies
|
||||
- you want Hermes to function like a portable research or ops assistant
|
||||
|
||||
#### Discord DMs with spoken output
|
||||
|
||||
Useful when you want private interaction without server-channel mention behavior.
|
||||
|
||||
## Use case 3: Discord voice channels
|
||||
|
||||
This is the most advanced mode.
|
||||
|
||||
Hermes joins a Discord VC, listens to user speech, transcribes it, runs the normal agent pipeline, and speaks replies back into the channel.
|
||||
|
||||
## Required Discord permissions
|
||||
|
||||
In addition to the normal text-bot setup, make sure the bot has:
|
||||
- Connect
|
||||
- Speak
|
||||
- preferably Use Voice Activity
|
||||
|
||||
Also enable privileged intents in the Developer Portal:
|
||||
- Presence Intent
|
||||
- Server Members Intent
|
||||
- Message Content Intent
|
||||
|
||||
## Join and leave
|
||||
|
||||
In a Discord text channel where the bot is present:
|
||||
|
||||
```text
|
||||
/voice join
|
||||
/voice leave
|
||||
/voice status
|
||||
```
|
||||
|
||||
### What happens when joined
|
||||
|
||||
- users speak in the VC
|
||||
- Hermes detects speech boundaries
|
||||
- transcripts are posted in the associated text channel
|
||||
- Hermes responds in text and audio
|
||||
- the text channel is the one where `/voice join` was issued
|
||||
|
||||
### Best practices for Discord VC use
|
||||
|
||||
- keep `DISCORD_ALLOWED_USERS` tight
|
||||
- use a dedicated bot/testing channel at first
|
||||
- verify STT and TTS work in ordinary text-chat voice mode before trying VC mode
|
||||
|
||||
## Voice quality recommendations
|
||||
|
||||
### Best quality setup
|
||||
|
||||
- STT: local `large-v3` or Groq `whisper-large-v3`
|
||||
- TTS: ElevenLabs
|
||||
|
||||
### Best speed / convenience setup
|
||||
|
||||
- STT: local `base` or Groq
|
||||
- TTS: Edge
|
||||
|
||||
### Best zero-cost setup
|
||||
|
||||
- STT: local
|
||||
- TTS: Edge
|
||||
|
||||
## Common failure modes
|
||||
|
||||
### "No audio device found"
|
||||
|
||||
Install `portaudio`.
|
||||
|
||||
### "Bot joins but hears nothing"
|
||||
|
||||
Check:
|
||||
- your Discord user ID is in `DISCORD_ALLOWED_USERS`
|
||||
- you are not muted
|
||||
- privileged intents are enabled
|
||||
- the bot has Connect/Speak permissions
|
||||
|
||||
### "It transcribes but does not speak"
|
||||
|
||||
Check:
|
||||
- TTS provider config
|
||||
- API key / quota for ElevenLabs or OpenAI
|
||||
- `ffmpeg` install for Edge conversion paths
|
||||
|
||||
### "Whisper outputs garbage"
|
||||
|
||||
Try:
|
||||
- quieter environment
|
||||
- higher `silence_threshold`
|
||||
- different STT provider/model
|
||||
- shorter, clearer utterances
|
||||
|
||||
### "It works in DMs but not in server channels"
|
||||
|
||||
That is often mention policy.
|
||||
|
||||
By default, the bot needs an `@mention` in Discord server text channels unless configured otherwise.
|
||||
|
||||
## Suggested first-week setup
|
||||
|
||||
If you want the shortest path to success:
|
||||
|
||||
1. get text Hermes working
|
||||
2. install `hermes-agent[voice]`
|
||||
3. use CLI voice mode with local STT + Edge TTS
|
||||
4. then enable `/voice on` in Telegram or Discord
|
||||
5. only after that, try Discord VC mode
|
||||
|
||||
That progression keeps the debugging surface small.
|
||||
|
||||
## Where to read next
|
||||
|
||||
- [Voice Mode feature reference](/docs/user-guide/features/voice-mode)
|
||||
- [Messaging Gateway](/docs/user-guide/messaging)
|
||||
- [Discord setup](/docs/user-guide/messaging/discord)
|
||||
- [Telegram setup](/docs/user-guide/messaging/telegram)
|
||||
- [Configuration](/docs/user-guide/configuration)
|
||||
|
|
@ -33,6 +33,8 @@ It's not a coding copilot tethered to an IDE or a chatbot wrapper around a singl
|
|||
| 📚 **[Skills System](/docs/user-guide/features/skills)** | Procedural memory the agent creates and reuses |
|
||||
| 🔌 **[MCP Integration](/docs/user-guide/features/mcp)** | Connect to MCP servers, filter their tools, and extend Hermes safely |
|
||||
| 🧭 **[Use MCP with Hermes](/docs/guides/use-mcp-with-hermes)** | Practical MCP setup patterns, examples, and tutorials |
|
||||
| 🎙️ **[Voice Mode](/docs/user-guide/features/voice-mode)** | Real-time voice interaction in CLI, Telegram, Discord, and Discord VC |
|
||||
| 🗣️ **[Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)** | Hands-on setup and usage patterns for Hermes voice workflows |
|
||||
| 🎭 **[Personality & SOUL.md](/docs/user-guide/features/personality)** | Define Hermes' default voice with a global SOUL.md |
|
||||
| 📄 **[Context Files](/docs/user-guide/features/context-files)** | Project context files that shape every conversation |
|
||||
| 🔒 **[Security](/docs/user-guide/security)** | Command approval, authorization, container isolation |
|
||||
|
|
|
|||
|
|
@ -31,7 +31,7 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
|
|||
| `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code setup-token (same as `ANTHROPIC_TOKEN`) |
|
||||
| `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
|
||||
| `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
|
||||
| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for TTS and voice transcription (separate from custom endpoint) |
|
||||
| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for OpenAI speech-to-text and text-to-speech providers |
|
||||
| `HERMES_HOME` | Override Hermes config directory (default: `~/.hermes`) |
|
||||
|
||||
## Provider Auth (OAuth)
|
||||
|
|
@ -57,7 +57,12 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
|
|||
| `BROWSERBASE_PROJECT_ID` | Browserbase project ID |
|
||||
| `BROWSER_INACTIVITY_TIMEOUT` | Browser session inactivity timeout in seconds |
|
||||
| `FAL_KEY` | Image generation ([fal.ai](https://fal.ai/)) |
|
||||
| `ELEVENLABS_API_KEY` | Premium TTS voices ([elevenlabs.io](https://elevenlabs.io/)) |
|
||||
| `GROQ_API_KEY` | Groq Whisper STT API key ([groq.com](https://groq.com/)) |
|
||||
| `ELEVENLABS_API_KEY` | ElevenLabs premium TTS voices ([elevenlabs.io](https://elevenlabs.io/)) |
|
||||
| `STT_GROQ_MODEL` | Override the Groq STT model (default: `whisper-large-v3-turbo`) |
|
||||
| `GROQ_BASE_URL` | Override the Groq OpenAI-compatible STT endpoint |
|
||||
| `STT_OPENAI_MODEL` | Override the OpenAI STT model (default: `whisper-1`) |
|
||||
| `STT_OPENAI_BASE_URL` | Override the OpenAI-compatible STT endpoint |
|
||||
| `HONCHO_API_KEY` | Cross-session user modeling ([honcho.dev](https://honcho.dev/)) |
|
||||
| `TINKER_API_KEY` | RL training ([tinker-console.thinkingmachines.ai](https://tinker-console.thinkingmachines.ai/)) |
|
||||
| `WANDB_API_KEY` | RL training metrics ([wandb.ai](https://wandb.ai/)) |
|
||||
|
|
|
|||
|
|
@ -45,6 +45,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
|
|||
| `/verbose` | Cycle tool progress display: off → new → all → verbose |
|
||||
| `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) |
|
||||
| `/skin` | Show or change the display skin/theme |
|
||||
| `/voice [on\|off\|tts\|status]` | Toggle CLI voice mode and spoken playback. Recording uses `voice.record_key` (default: `Ctrl+B`). |
|
||||
|
||||
### Tools & Skills
|
||||
|
||||
|
|
@ -105,6 +106,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
|
|||
| `/usage` | Show token usage for the current session. |
|
||||
| `/insights [days]` | Show usage analytics. |
|
||||
| `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display. |
|
||||
| `/voice [on\|off\|tts\|join\|channel\|leave\|status]` | Control spoken replies in chat. `join`/`channel`/`leave` manage Discord voice-channel mode. |
|
||||
| `/rollback [number]` | List or restore filesystem checkpoints. |
|
||||
| `/background <prompt>` | Run a prompt in a separate background session. |
|
||||
| `/reload-mcp` | Reload MCP servers from config. |
|
||||
|
|
@ -116,4 +118,5 @@ The messaging gateway supports the following built-in commands inside Telegram,
|
|||
|
||||
- `/skin`, `/tools`, `/toolsets`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, and `/verbose` are **CLI-only** commands.
|
||||
- `/status`, `/stop`, `/sethome`, `/resume`, `/background`, and `/update` are **messaging-only** commands.
|
||||
- `/reload-mcp` and `/rollback` work in **both** the CLI and the messaging gateway.
|
||||
- `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway.
|
||||
- `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord.
|
||||
|
|
|
|||
|
|
@ -77,6 +77,7 @@ When resuming a previous session (`hermes -c` or `hermes --resume <id>`), a "Pre
|
|||
| `Alt+Enter` or `Ctrl+J` | New line (multi-line input) |
|
||||
| `Alt+V` | Paste an image from the clipboard when supported by the terminal |
|
||||
| `Ctrl+V` | Paste text and opportunistically attach clipboard images |
|
||||
| `Ctrl+B` | Start/stop voice recording when voice mode is enabled (`voice.record_key`, default: `ctrl+b`) |
|
||||
| `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) |
|
||||
| `Ctrl+D` | Exit |
|
||||
| `Tab` | Autocomplete slash commands |
|
||||
|
|
@ -95,11 +96,15 @@ Common examples:
|
|||
| `/skills browse` | Browse the skills hub and official optional skills |
|
||||
| `/background <prompt>` | Run a prompt in a separate background session |
|
||||
| `/skin` | Show or switch the active CLI skin |
|
||||
| `/voice on` | Enable CLI voice mode (press `Ctrl+B` to record) |
|
||||
| `/voice tts` | Toggle spoken playback for Hermes replies |
|
||||
| `/reasoning high` | Increase reasoning effort |
|
||||
| `/title My Session` | Name the current session |
|
||||
|
||||
For the full built-in CLI and messaging lists, see [Slash Commands Reference](../reference/slash-commands.md).
|
||||
|
||||
For setup, providers, silence tuning, and messaging/Discord voice usage, see [Voice Mode](features/voice-mode.md).
|
||||
|
||||
:::tip
|
||||
Commands are case-insensitive — `/HELP` works the same as `/help`. Installed skills also become slash commands automatically.
|
||||
:::
|
||||
|
|
|
|||
|
|
@ -695,6 +695,8 @@ tts:
|
|||
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
|
||||
```
|
||||
|
||||
This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).
|
||||
|
||||
## Display Settings
|
||||
|
||||
```yaml
|
||||
|
|
@ -719,10 +721,43 @@ display:
|
|||
|
||||
```yaml
|
||||
stt:
|
||||
provider: "openai" # STT provider
|
||||
provider: "local" # "local" | "groq" | "openai"
|
||||
local:
|
||||
model: "base" # tiny, base, small, medium, large-v3
|
||||
openai:
|
||||
model: "whisper-1" # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
|
||||
# model: "whisper-1" # Legacy fallback key still respected
|
||||
```
|
||||
|
||||
Requires `VOICE_TOOLS_OPENAI_KEY` in `.env` for OpenAI STT.
|
||||
Provider behavior:
|
||||
|
||||
- `local` uses `faster-whisper` running on your machine. Install it separately with `pip install faster-whisper`.
|
||||
- `groq` uses Groq's Whisper-compatible endpoint and reads `GROQ_API_KEY`.
|
||||
- `openai` uses the OpenAI speech API and reads `VOICE_TOOLS_OPENAI_KEY`.
|
||||
|
||||
If the requested provider is unavailable, Hermes falls back automatically in this order: `local` → `groq` → `openai`.
|
||||
|
||||
Groq and OpenAI model overrides are environment-driven:
|
||||
|
||||
```bash
|
||||
STT_GROQ_MODEL=whisper-large-v3-turbo
|
||||
STT_OPENAI_MODEL=whisper-1
|
||||
GROQ_BASE_URL=https://api.groq.com/openai/v1
|
||||
STT_OPENAI_BASE_URL=https://api.openai.com/v1
|
||||
```
|
||||
|
||||
## Voice Mode (CLI)
|
||||
|
||||
```yaml
|
||||
voice:
|
||||
record_key: "ctrl+b" # Push-to-talk key inside the CLI
|
||||
max_recording_seconds: 120 # Hard stop for long recordings
|
||||
auto_tts: false # Enable spoken replies automatically when /voice on
|
||||
silence_threshold: 200 # RMS threshold for speech detection
|
||||
silence_duration: 3.0 # Seconds of silence before auto-stop
|
||||
```
|
||||
|
||||
Use `/voice on` in the CLI to enable microphone mode, `record_key` to start/stop recording, and `/voice tts` to toggle spoken replies. See [Voice Mode](/docs/user-guide/features/voice-mode) for end-to-end setup and platform-specific behavior.
|
||||
|
||||
## Quick Commands
|
||||
|
||||
|
|
|
|||
|
|
@ -194,6 +194,8 @@ The agent's final response is automatically delivered. You do not need to call `
|
|||
|
||||
## Schedule formats
|
||||
|
||||
The agent's final response is automatically delivered — you do **not** need to include `send_message` in the cron prompt for that same destination. If a cron run calls `send_message` to the exact target the scheduler will already deliver to, Hermes skips that duplicate send and tells the model to put the user-facing content in the final response instead. Use `send_message` only for additional or different targets.
|
||||
|
||||
### Relative delays (one-shot)
|
||||
|
||||
```text
|
||||
|
|
|
|||
|
|
@ -8,12 +8,14 @@ description: "Real-time voice conversations with Hermes Agent — CLI, Telegram,
|
|||
|
||||
Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
|
||||
|
||||
If you want a practical setup walkthrough with recommended configurations and real usage patterns, see [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before using voice features, make sure you have:
|
||||
|
||||
1. **Hermes Agent installed** — `pip install hermes-agent` (see [Getting Started](../../getting-started.md))
|
||||
2. **An LLM provider configured** — set `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and `LLM_MODEL` in `~/.hermes/.env`
|
||||
1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
|
||||
2. **An LLM provider configured** — run `hermes model` or set your preferred provider credentials in `~/.hermes/.env`
|
||||
3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice
|
||||
|
||||
:::tip
|
||||
|
|
|
|||
|
|
@ -210,8 +210,13 @@ Replace the ID with the actual channel ID (right-click → Copy Channel ID with
|
|||
|
||||
Hermes Agent supports Discord voice messages:
|
||||
|
||||
- **Incoming voice messages** are automatically transcribed using Whisper (requires `GROQ_API_KEY` or `VOICE_TOOLS_OPENAI_KEY` to be set in your environment).
|
||||
- **Incoming voice messages** are automatically transcribed using the configured STT provider: local `faster-whisper` (no key), Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`).
|
||||
- **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies.
|
||||
- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel.
|
||||
|
||||
For the full setup and operational guide, see:
|
||||
- [Voice Mode](/docs/user-guide/features/voice-mode)
|
||||
- [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
|
|
|||
|
|
@ -8,6 +8,8 @@ description: "Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal,
|
|||
|
||||
Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant, or your browser. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages.
|
||||
|
||||
For the full voice feature set — including CLI microphone mode, spoken replies in messaging, and Discord voice-channel conversations — see [Voice Mode](/docs/user-guide/features/voice-mode) and [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
|
||||
|
||||
## Architecture
|
||||
|
||||
```text
|
||||
|
|
@ -77,6 +79,7 @@ hermes gateway status # Check service status
|
|||
| `/usage` | Show token usage for this session |
|
||||
| `/insights [days]` | Show usage insights and analytics |
|
||||
| `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display |
|
||||
| `/voice [on\|off\|tts\|join\|leave\|status]` | Control messaging voice replies and Discord voice-channel behavior |
|
||||
| `/rollback [number]` | List or restore filesystem checkpoints |
|
||||
| `/background <prompt>` | Run a prompt in a separate background session |
|
||||
| `/reload-mcp` | Reload MCP servers from config |
|
||||
|
|
|
|||
|
|
@ -224,7 +224,7 @@ Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`).
|
|||
|
||||
Hermes supports voice on Slack:
|
||||
|
||||
- **Incoming:** Voice/audio messages are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Incoming:** Voice/audio messages are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Outgoing:** TTS responses are sent as audio file attachments
|
||||
|
||||
---
|
||||
|
|
|
|||
|
|
@ -131,7 +131,11 @@ Group chat IDs are negative numbers (e.g., `-1001234567890`). Your personal DM c
|
|||
|
||||
### Incoming Voice (Speech-to-Text)
|
||||
|
||||
Voice messages you send on Telegram are automatically transcribed using OpenAI's Whisper API and injected as text into the conversation. This requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.
|
||||
Voice messages you send on Telegram are automatically transcribed by Hermes's configured STT provider and injected as text into the conversation.
|
||||
|
||||
- `local` uses `faster-whisper` on the machine running Hermes — no API key required
|
||||
- `groq` uses Groq Whisper and requires `GROQ_API_KEY`
|
||||
- `openai` uses OpenAI Whisper and requires `VOICE_TOOLS_OPENAI_KEY`
|
||||
|
||||
### Outgoing Voice (Text-to-Speech)
|
||||
|
||||
|
|
@ -173,7 +177,7 @@ Hermes Agent works in Telegram group chats with a few considerations:
|
|||
| Bot not responding at all | Verify `TELEGRAM_BOT_TOKEN` is correct. Check `hermes gateway` logs for errors. |
|
||||
| Bot responds with "unauthorized" | Your user ID is not in `TELEGRAM_ALLOWED_USERS`. Double-check with @userinfobot. |
|
||||
| Bot ignores group messages | Privacy mode is likely on. Disable it (Step 3) or make the bot a group admin. **Remember to remove and re-add the bot after changing privacy.** |
|
||||
| Voice messages not transcribed | Check that `VOICE_TOOLS_OPENAI_KEY` is set and valid in `~/.hermes/.env`. |
|
||||
| Voice messages not transcribed | Verify STT is available: install `faster-whisper` for local transcription, or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`. |
|
||||
| Voice replies are files, not bubbles | Install `ffmpeg` (needed for Edge TTS Opus conversion). |
|
||||
| Bot token revoked/invalid | Generate a new token via `/revoke` then `/newbot` or `/token` in BotFather. Update your `.env` file. |
|
||||
|
||||
|
|
|
|||
|
|
@ -137,7 +137,7 @@ with reconnection logic.
|
|||
|
||||
Hermes supports voice on WhatsApp:
|
||||
|
||||
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Outgoing:** TTS responses are sent as MP3 audio file attachments
|
||||
- Agent responses are prefixed with "⚕ **Hermes Agent**" for easy identification
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue