mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
docs: complete voice mode docs
This commit is contained in:
parent
7b140b31e6
commit
e099117a3b
12 changed files with 84 additions and 11 deletions
|
|
@ -119,6 +119,7 @@ uv pip install -e "."
|
||||||
| `cli` | Terminal menu UI for setup wizard | `uv pip install -e ".[cli]"` |
|
| `cli` | Terminal menu UI for setup wizard | `uv pip install -e ".[cli]"` |
|
||||||
| `modal` | Modal cloud execution backend | `uv pip install -e ".[modal]"` |
|
| `modal` | Modal cloud execution backend | `uv pip install -e ".[modal]"` |
|
||||||
| `tts-premium` | ElevenLabs premium voices | `uv pip install -e ".[tts-premium]"` |
|
| `tts-premium` | ElevenLabs premium voices | `uv pip install -e ".[tts-premium]"` |
|
||||||
|
| `voice` | CLI microphone input + audio playback | `uv pip install -e ".[voice]"` |
|
||||||
| `pty` | PTY terminal support | `uv pip install -e ".[pty]"` |
|
| `pty` | PTY terminal support | `uv pip install -e ".[pty]"` |
|
||||||
| `honcho` | AI-native memory (Honcho integration) | `uv pip install -e ".[honcho]"` |
|
| `honcho` | AI-native memory (Honcho integration) | `uv pip install -e ".[honcho]"` |
|
||||||
| `mcp` | Model Context Protocol support | `uv pip install -e ".[mcp]"` |
|
| `mcp` | Model Context Protocol support | `uv pip install -e ".[mcp]"` |
|
||||||
|
|
|
||||||
|
|
@ -129,6 +129,25 @@ Chat with Hermes from your phone or other surfaces via Telegram, Discord, Slack,
|
||||||
hermes gateway setup # Interactive platform configuration
|
hermes gateway setup # Interactive platform configuration
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Add voice mode
|
||||||
|
|
||||||
|
Want microphone input in the CLI or spoken replies in messaging?
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install hermes-agent[voice]
|
||||||
|
|
||||||
|
# Optional but recommended for free local speech-to-text
|
||||||
|
pip install faster-whisper
|
||||||
|
```
|
||||||
|
|
||||||
|
Then start Hermes and enable it inside the CLI:
|
||||||
|
|
||||||
|
```text
|
||||||
|
/voice on
|
||||||
|
```
|
||||||
|
|
||||||
|
Press `Ctrl+B` to record, or use `/voice tts` to have Hermes speak its replies. See [Voice Mode](../user-guide/features/voice-mode.md) for the full setup across CLI, Telegram, Discord, and Discord voice channels.
|
||||||
|
|
||||||
### Schedule automated tasks
|
### Schedule automated tasks
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -31,7 +31,7 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
|
||||||
| `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code setup-token (same as `ANTHROPIC_TOKEN`) |
|
| `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code setup-token (same as `ANTHROPIC_TOKEN`) |
|
||||||
| `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
|
| `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
|
||||||
| `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
|
| `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
|
||||||
| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for TTS and voice transcription (separate from custom endpoint) |
|
| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for OpenAI speech-to-text and text-to-speech providers |
|
||||||
| `HERMES_HOME` | Override Hermes config directory (default: `~/.hermes`) |
|
| `HERMES_HOME` | Override Hermes config directory (default: `~/.hermes`) |
|
||||||
|
|
||||||
## Provider Auth (OAuth)
|
## Provider Auth (OAuth)
|
||||||
|
|
@ -57,7 +57,12 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
|
||||||
| `BROWSERBASE_PROJECT_ID` | Browserbase project ID |
|
| `BROWSERBASE_PROJECT_ID` | Browserbase project ID |
|
||||||
| `BROWSER_INACTIVITY_TIMEOUT` | Browser session inactivity timeout in seconds |
|
| `BROWSER_INACTIVITY_TIMEOUT` | Browser session inactivity timeout in seconds |
|
||||||
| `FAL_KEY` | Image generation ([fal.ai](https://fal.ai/)) |
|
| `FAL_KEY` | Image generation ([fal.ai](https://fal.ai/)) |
|
||||||
| `ELEVENLABS_API_KEY` | Premium TTS voices ([elevenlabs.io](https://elevenlabs.io/)) |
|
| `GROQ_API_KEY` | Groq Whisper STT API key ([groq.com](https://groq.com/)) |
|
||||||
|
| `ELEVENLABS_API_KEY` | ElevenLabs premium TTS voices ([elevenlabs.io](https://elevenlabs.io/)) |
|
||||||
|
| `STT_GROQ_MODEL` | Override the Groq STT model (default: `whisper-large-v3-turbo`) |
|
||||||
|
| `GROQ_BASE_URL` | Override the Groq OpenAI-compatible STT endpoint |
|
||||||
|
| `STT_OPENAI_MODEL` | Override the OpenAI STT model (default: `whisper-1`) |
|
||||||
|
| `STT_OPENAI_BASE_URL` | Override the OpenAI-compatible STT endpoint |
|
||||||
| `HONCHO_API_KEY` | Cross-session user modeling ([honcho.dev](https://honcho.dev/)) |
|
| `HONCHO_API_KEY` | Cross-session user modeling ([honcho.dev](https://honcho.dev/)) |
|
||||||
| `TINKER_API_KEY` | RL training ([tinker-console.thinkingmachines.ai](https://tinker-console.thinkingmachines.ai/)) |
|
| `TINKER_API_KEY` | RL training ([tinker-console.thinkingmachines.ai](https://tinker-console.thinkingmachines.ai/)) |
|
||||||
| `WANDB_API_KEY` | RL training metrics ([wandb.ai](https://wandb.ai/)) |
|
| `WANDB_API_KEY` | RL training metrics ([wandb.ai](https://wandb.ai/)) |
|
||||||
|
|
|
||||||
|
|
@ -45,6 +45,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
|
||||||
| `/verbose` | Cycle tool progress display: off → new → all → verbose |
|
| `/verbose` | Cycle tool progress display: off → new → all → verbose |
|
||||||
| `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) |
|
| `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) |
|
||||||
| `/skin` | Show or change the display skin/theme |
|
| `/skin` | Show or change the display skin/theme |
|
||||||
|
| `/voice [on\|off\|tts\|status]` | Toggle CLI voice mode and spoken playback. Recording uses `voice.record_key` (default: `Ctrl+B`). |
|
||||||
|
|
||||||
### Tools & Skills
|
### Tools & Skills
|
||||||
|
|
||||||
|
|
@ -105,6 +106,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
|
||||||
| `/usage` | Show token usage for the current session. |
|
| `/usage` | Show token usage for the current session. |
|
||||||
| `/insights [days]` | Show usage analytics. |
|
| `/insights [days]` | Show usage analytics. |
|
||||||
| `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display. |
|
| `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display. |
|
||||||
|
| `/voice [on\|off\|tts\|join\|channel\|leave\|status]` | Control spoken replies in chat. `join`/`channel`/`leave` manage Discord voice-channel mode. |
|
||||||
| `/rollback [number]` | List or restore filesystem checkpoints. |
|
| `/rollback [number]` | List or restore filesystem checkpoints. |
|
||||||
| `/background <prompt>` | Run a prompt in a separate background session. |
|
| `/background <prompt>` | Run a prompt in a separate background session. |
|
||||||
| `/reload-mcp` | Reload MCP servers from config. |
|
| `/reload-mcp` | Reload MCP servers from config. |
|
||||||
|
|
@ -116,4 +118,5 @@ The messaging gateway supports the following built-in commands inside Telegram,
|
||||||
|
|
||||||
- `/skin`, `/tools`, `/toolsets`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, and `/verbose` are **CLI-only** commands.
|
- `/skin`, `/tools`, `/toolsets`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, and `/verbose` are **CLI-only** commands.
|
||||||
- `/status`, `/stop`, `/sethome`, `/resume`, `/background`, and `/update` are **messaging-only** commands.
|
- `/status`, `/stop`, `/sethome`, `/resume`, `/background`, and `/update` are **messaging-only** commands.
|
||||||
- `/reload-mcp` and `/rollback` work in **both** the CLI and the messaging gateway.
|
- `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway.
|
||||||
|
- `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord.
|
||||||
|
|
|
||||||
|
|
@ -77,6 +77,7 @@ When resuming a previous session (`hermes -c` or `hermes --resume <id>`), a "Pre
|
||||||
| `Alt+Enter` or `Ctrl+J` | New line (multi-line input) |
|
| `Alt+Enter` or `Ctrl+J` | New line (multi-line input) |
|
||||||
| `Alt+V` | Paste an image from the clipboard when supported by the terminal |
|
| `Alt+V` | Paste an image from the clipboard when supported by the terminal |
|
||||||
| `Ctrl+V` | Paste text and opportunistically attach clipboard images |
|
| `Ctrl+V` | Paste text and opportunistically attach clipboard images |
|
||||||
|
| `Ctrl+B` | Start/stop voice recording when voice mode is enabled (`voice.record_key`, default: `ctrl+b`) |
|
||||||
| `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) |
|
| `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) |
|
||||||
| `Ctrl+D` | Exit |
|
| `Ctrl+D` | Exit |
|
||||||
| `Tab` | Autocomplete slash commands |
|
| `Tab` | Autocomplete slash commands |
|
||||||
|
|
@ -95,11 +96,15 @@ Common examples:
|
||||||
| `/skills browse` | Browse the skills hub and official optional skills |
|
| `/skills browse` | Browse the skills hub and official optional skills |
|
||||||
| `/background <prompt>` | Run a prompt in a separate background session |
|
| `/background <prompt>` | Run a prompt in a separate background session |
|
||||||
| `/skin` | Show or switch the active CLI skin |
|
| `/skin` | Show or switch the active CLI skin |
|
||||||
|
| `/voice on` | Enable CLI voice mode (press `Ctrl+B` to record) |
|
||||||
|
| `/voice tts` | Toggle spoken playback for Hermes replies |
|
||||||
| `/reasoning high` | Increase reasoning effort |
|
| `/reasoning high` | Increase reasoning effort |
|
||||||
| `/title My Session` | Name the current session |
|
| `/title My Session` | Name the current session |
|
||||||
|
|
||||||
For the full built-in CLI and messaging lists, see [Slash Commands Reference](../reference/slash-commands.md).
|
For the full built-in CLI and messaging lists, see [Slash Commands Reference](../reference/slash-commands.md).
|
||||||
|
|
||||||
|
For setup, providers, silence tuning, and messaging/Discord voice usage, see [Voice Mode](features/voice-mode.md).
|
||||||
|
|
||||||
:::tip
|
:::tip
|
||||||
Commands are case-insensitive — `/HELP` works the same as `/help`. Installed skills also become slash commands automatically.
|
Commands are case-insensitive — `/HELP` works the same as `/help`. Installed skills also become slash commands automatically.
|
||||||
:::
|
:::
|
||||||
|
|
|
||||||
|
|
@ -695,6 +695,8 @@ tts:
|
||||||
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
|
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
|
||||||
```
|
```
|
||||||
|
|
||||||
|
This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).
|
||||||
|
|
||||||
## Display Settings
|
## Display Settings
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
|
@ -719,10 +721,43 @@ display:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
stt:
|
stt:
|
||||||
provider: "openai" # STT provider
|
provider: "local" # "local" | "groq" | "openai"
|
||||||
|
local:
|
||||||
|
model: "base" # tiny, base, small, medium, large-v3
|
||||||
|
openai:
|
||||||
|
model: "whisper-1" # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
|
||||||
|
# model: "whisper-1" # Legacy fallback key still respected
|
||||||
```
|
```
|
||||||
|
|
||||||
Requires `VOICE_TOOLS_OPENAI_KEY` in `.env` for OpenAI STT.
|
Provider behavior:
|
||||||
|
|
||||||
|
- `local` uses `faster-whisper` running on your machine. Install it separately with `pip install faster-whisper`.
|
||||||
|
- `groq` uses Groq's Whisper-compatible endpoint and reads `GROQ_API_KEY`.
|
||||||
|
- `openai` uses the OpenAI speech API and reads `VOICE_TOOLS_OPENAI_KEY`.
|
||||||
|
|
||||||
|
If the requested provider is unavailable, Hermes falls back automatically in this order: `local` → `groq` → `openai`.
|
||||||
|
|
||||||
|
Groq and OpenAI model overrides are environment-driven:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
STT_GROQ_MODEL=whisper-large-v3-turbo
|
||||||
|
STT_OPENAI_MODEL=whisper-1
|
||||||
|
GROQ_BASE_URL=https://api.groq.com/openai/v1
|
||||||
|
STT_OPENAI_BASE_URL=https://api.openai.com/v1
|
||||||
|
```
|
||||||
|
|
||||||
|
## Voice Mode (CLI)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
voice:
|
||||||
|
record_key: "ctrl+b" # Push-to-talk key inside the CLI
|
||||||
|
max_recording_seconds: 120 # Hard stop for long recordings
|
||||||
|
auto_tts: false # Enable spoken replies automatically when /voice on
|
||||||
|
silence_threshold: 200 # RMS threshold for speech detection
|
||||||
|
silence_duration: 3.0 # Seconds of silence before auto-stop
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `/voice on` in the CLI to enable microphone mode, `record_key` to start/stop recording, and `/voice tts` to toggle spoken replies. See [Voice Mode](/docs/user-guide/features/voice-mode) for end-to-end setup and platform-specific behavior.
|
||||||
|
|
||||||
## Quick Commands
|
## Quick Commands
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -15,7 +15,7 @@ If you want a practical setup walkthrough with recommended configurations and re
|
||||||
Before using voice features, make sure you have:
|
Before using voice features, make sure you have:
|
||||||
|
|
||||||
1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
|
1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
|
||||||
2. **An LLM provider configured** — set `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and `LLM_MODEL` in `~/.hermes/.env`
|
2. **An LLM provider configured** — run `hermes model` or set your preferred provider credentials in `~/.hermes/.env`
|
||||||
3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice
|
3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice
|
||||||
|
|
||||||
:::tip
|
:::tip
|
||||||
|
|
|
||||||
|
|
@ -210,7 +210,7 @@ Replace the ID with the actual channel ID (right-click → Copy Channel ID with
|
||||||
|
|
||||||
Hermes Agent supports Discord voice messages:
|
Hermes Agent supports Discord voice messages:
|
||||||
|
|
||||||
- **Incoming voice messages** are automatically transcribed using Whisper (requires `GROQ_API_KEY` or `VOICE_TOOLS_OPENAI_KEY` to be set in your environment).
|
- **Incoming voice messages** are automatically transcribed using the configured STT provider: local `faster-whisper` (no key), Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`).
|
||||||
- **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies.
|
- **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies.
|
||||||
- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel.
|
- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -224,7 +224,7 @@ Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`).
|
||||||
|
|
||||||
Hermes supports voice on Slack:
|
Hermes supports voice on Slack:
|
||||||
|
|
||||||
- **Incoming:** Voice/audio messages are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
|
- **Incoming:** Voice/audio messages are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
|
||||||
- **Outgoing:** TTS responses are sent as audio file attachments
|
- **Outgoing:** TTS responses are sent as audio file attachments
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
|
||||||
|
|
@ -131,7 +131,11 @@ Group chat IDs are negative numbers (e.g., `-1001234567890`). Your personal DM c
|
||||||
|
|
||||||
### Incoming Voice (Speech-to-Text)
|
### Incoming Voice (Speech-to-Text)
|
||||||
|
|
||||||
Voice messages you send on Telegram are automatically transcribed using OpenAI's Whisper API and injected as text into the conversation. This requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.
|
Voice messages you send on Telegram are automatically transcribed by Hermes's configured STT provider and injected as text into the conversation.
|
||||||
|
|
||||||
|
- `local` uses `faster-whisper` on the machine running Hermes — no API key required
|
||||||
|
- `groq` uses Groq Whisper and requires `GROQ_API_KEY`
|
||||||
|
- `openai` uses OpenAI Whisper and requires `VOICE_TOOLS_OPENAI_KEY`
|
||||||
|
|
||||||
### Outgoing Voice (Text-to-Speech)
|
### Outgoing Voice (Text-to-Speech)
|
||||||
|
|
||||||
|
|
@ -173,7 +177,7 @@ Hermes Agent works in Telegram group chats with a few considerations:
|
||||||
| Bot not responding at all | Verify `TELEGRAM_BOT_TOKEN` is correct. Check `hermes gateway` logs for errors. |
|
| Bot not responding at all | Verify `TELEGRAM_BOT_TOKEN` is correct. Check `hermes gateway` logs for errors. |
|
||||||
| Bot responds with "unauthorized" | Your user ID is not in `TELEGRAM_ALLOWED_USERS`. Double-check with @userinfobot. |
|
| Bot responds with "unauthorized" | Your user ID is not in `TELEGRAM_ALLOWED_USERS`. Double-check with @userinfobot. |
|
||||||
| Bot ignores group messages | Privacy mode is likely on. Disable it (Step 3) or make the bot a group admin. **Remember to remove and re-add the bot after changing privacy.** |
|
| Bot ignores group messages | Privacy mode is likely on. Disable it (Step 3) or make the bot a group admin. **Remember to remove and re-add the bot after changing privacy.** |
|
||||||
| Voice messages not transcribed | Check that `VOICE_TOOLS_OPENAI_KEY` is set and valid in `~/.hermes/.env`. |
|
| Voice messages not transcribed | Verify STT is available: install `faster-whisper` for local transcription, or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`. |
|
||||||
| Voice replies are files, not bubbles | Install `ffmpeg` (needed for Edge TTS Opus conversion). |
|
| Voice replies are files, not bubbles | Install `ffmpeg` (needed for Edge TTS Opus conversion). |
|
||||||
| Bot token revoked/invalid | Generate a new token via `/revoke` then `/newbot` or `/token` in BotFather. Update your `.env` file. |
|
| Bot token revoked/invalid | Generate a new token via `/revoke` then `/newbot` or `/token` in BotFather. Update your `.env` file. |
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -137,7 +137,7 @@ with reconnection logic.
|
||||||
|
|
||||||
Hermes supports voice on WhatsApp:
|
Hermes supports voice on WhatsApp:
|
||||||
|
|
||||||
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
|
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
|
||||||
- **Outgoing:** TTS responses are sent as MP3 audio file attachments
|
- **Outgoing:** TTS responses are sent as MP3 audio file attachments
|
||||||
- Agent responses are prefixed with "⚕ **Hermes Agent**" for easy identification
|
- Agent responses are prefixed with "⚕ **Hermes Agent**" for easy identification
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -76,6 +76,7 @@ const sidebars: SidebarsConfig = {
|
||||||
type: 'category',
|
type: 'category',
|
||||||
label: 'Web & Media',
|
label: 'Web & Media',
|
||||||
items: [
|
items: [
|
||||||
|
'user-guide/features/voice-mode',
|
||||||
'user-guide/features/browser',
|
'user-guide/features/browser',
|
||||||
'user-guide/features/vision',
|
'user-guide/features/vision',
|
||||||
'user-guide/features/image-generation',
|
'user-guide/features/image-generation',
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue