mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
feat: add documentation website (Docusaurus)
- 25 documentation pages covering Getting Started, User Guide, Developer Guide, and Reference - Docusaurus with custom amber/gold theme matching the landing page branding - GitHub Actions workflow to deploy landing page + docs to GitHub Pages - Landing page at root, docs at /docs/ on hermes-agent.nousresearch.com - Content extracted and restructured from existing repo docs (README, AGENTS.md, CONTRIBUTING.md, docs/) - Auto-deploy on push to main when website/ or landingpage/ changes
This commit is contained in:
parent
1708dcd2b2
commit
ada3713e77
45 changed files with 22822 additions and 0 deletions
8
website/docs/user-guide/features/_category_.json
Normal file
8
website/docs/user-guide/features/_category_.json
Normal file
|
|
@ -0,0 +1,8 @@
|
|||
{
|
||||
"label": "Features",
|
||||
"position": 4,
|
||||
"link": {
|
||||
"type": "generated-index",
|
||||
"description": "Explore the powerful features of Hermes Agent."
|
||||
}
|
||||
}
|
||||
51
website/docs/user-guide/features/code-execution.md
Normal file
51
website/docs/user-guide/features/code-execution.md
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
---
|
||||
sidebar_position: 8
|
||||
title: "Code Execution"
|
||||
description: "Sandboxed Python execution with RPC tool access — collapse multi-step workflows into a single turn"
|
||||
---
|
||||
|
||||
# Code Execution (Programmatic Tool Calling)
|
||||
|
||||
The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a sandboxed child process on the agent host, communicating via Unix domain socket RPC.
|
||||
|
||||
## How It Works
|
||||
|
||||
```python
|
||||
# The agent can write scripts like:
|
||||
from hermes_tools import web_search, web_extract
|
||||
|
||||
results = web_search("Python 3.13 features", limit=5)
|
||||
for r in results["data"]["web"]:
|
||||
content = web_extract([r["url"]])
|
||||
# ... filter and process ...
|
||||
print(summary)
|
||||
```
|
||||
|
||||
**Available tools in sandbox:** `web_search`, `web_extract`, `read_file`, `write_file`, `search`, `patch`, `terminal` (foreground only).
|
||||
|
||||
## When the Agent Uses This
|
||||
|
||||
The agent uses `execute_code` when there are:
|
||||
|
||||
- **3+ tool calls** with processing logic between them
|
||||
- Bulk data filtering or conditional branching
|
||||
- Loops over results
|
||||
|
||||
The key benefit: intermediate tool results never enter the context window — only the final `print()` output comes back, dramatically reducing token usage.
|
||||
|
||||
## Security
|
||||
|
||||
:::danger Security Model
|
||||
The child process runs with a **minimal environment**. API keys, tokens, and credentials are stripped entirely. The script accesses tools exclusively via the RPC channel — it cannot read secrets from environment variables.
|
||||
:::
|
||||
|
||||
Only safe system variables (`PATH`, `HOME`, `LANG`, etc.) are passed through.
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
code_execution:
|
||||
timeout: 300 # Max seconds per script (default: 300)
|
||||
max_tool_calls: 50 # Max tool calls per execution (default: 50)
|
||||
```
|
||||
87
website/docs/user-guide/features/cron.md
Normal file
87
website/docs/user-guide/features/cron.md
Normal file
|
|
@ -0,0 +1,87 @@
|
|||
---
|
||||
sidebar_position: 5
|
||||
title: "Scheduled Tasks (Cron)"
|
||||
description: "Schedule automated tasks with natural language — cron jobs, delivery options, and the gateway scheduler"
|
||||
---
|
||||
|
||||
# Scheduled Tasks (Cron)
|
||||
|
||||
Schedule tasks to run automatically with natural language or cron expressions. The agent can self-schedule using the `schedule_cronjob` tool from any platform.
|
||||
|
||||
## Creating Scheduled Tasks
|
||||
|
||||
### In the CLI
|
||||
|
||||
Use the `/cron` slash command:
|
||||
|
||||
```
|
||||
/cron add 30m "Remind me to check the build"
|
||||
/cron add "every 2h" "Check server status"
|
||||
/cron add "0 9 * * *" "Morning briefing"
|
||||
/cron list
|
||||
/cron remove <job_id>
|
||||
```
|
||||
|
||||
### Through Natural Conversation
|
||||
|
||||
Simply ask the agent on any platform:
|
||||
|
||||
```
|
||||
Every morning at 9am, check Hacker News for AI news and send me a summary on Telegram.
|
||||
```
|
||||
|
||||
The agent will use the `schedule_cronjob` tool to set it up.
|
||||
|
||||
## How It Works
|
||||
|
||||
**Cron execution is handled by the gateway daemon.** The gateway ticks the scheduler every 60 seconds, running any due jobs in isolated agent sessions:
|
||||
|
||||
```bash
|
||||
hermes gateway install # Install as system service (recommended)
|
||||
hermes gateway # Or run in foreground
|
||||
|
||||
hermes cron list # View scheduled jobs
|
||||
hermes cron status # Check if gateway is running
|
||||
```
|
||||
|
||||
:::info
|
||||
Even if no messaging platforms are configured, the gateway stays running for cron. A file lock prevents duplicate execution if multiple processes overlap.
|
||||
:::
|
||||
|
||||
## Delivery Options
|
||||
|
||||
When scheduling jobs, you specify where the output goes:
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `"origin"` | Back to where the job was created |
|
||||
| `"local"` | Save to local files only |
|
||||
| `"telegram"` | Telegram home channel |
|
||||
| `"discord"` | Discord home channel |
|
||||
| `"telegram:123456"` | Specific Telegram chat |
|
||||
|
||||
The agent knows your connected platforms and home channels — it'll choose sensible defaults.
|
||||
|
||||
## Schedule Formats
|
||||
|
||||
- **Relative:** `30m`, `2h`, `1d`
|
||||
- **Human-readable:** `"every 2 hours"`, `"daily at 9am"`
|
||||
- **Cron expressions:** `"0 9 * * *"` (standard 5-field cron syntax)
|
||||
|
||||
## Managing Jobs
|
||||
|
||||
```bash
|
||||
# CLI commands
|
||||
hermes cron list # View all scheduled jobs
|
||||
hermes cron status # Check if the scheduler is running
|
||||
|
||||
# Slash commands (inside chat)
|
||||
/cron list
|
||||
/cron remove <job_id>
|
||||
```
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
Scheduled task prompts are scanned for instruction-override patterns (prompt injection). Jobs with suspicious content are blocked.
|
||||
:::
|
||||
60
website/docs/user-guide/features/delegation.md
Normal file
60
website/docs/user-guide/features/delegation.md
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
---
|
||||
sidebar_position: 7
|
||||
title: "Subagent Delegation"
|
||||
description: "Spawn isolated child agents for parallel workstreams with delegate_task"
|
||||
---
|
||||
|
||||
# Subagent Delegation
|
||||
|
||||
The `delegate_task` tool spawns child AIAgent instances with isolated context, restricted toolsets, and their own terminal sessions. Each child gets a fresh conversation and works independently — only its final summary enters the parent's context.
|
||||
|
||||
## Single Task
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Debug why tests fail",
|
||||
context="Error: assertion in test_foo.py line 42",
|
||||
toolsets=["terminal", "file"]
|
||||
)
|
||||
```
|
||||
|
||||
## Parallel Batch
|
||||
|
||||
Up to 3 concurrent subagents:
|
||||
|
||||
```python
|
||||
delegate_task(tasks=[
|
||||
{"goal": "Research topic A", "toolsets": ["web"]},
|
||||
{"goal": "Research topic B", "toolsets": ["web"]},
|
||||
{"goal": "Fix the build", "toolsets": ["terminal", "file"]}
|
||||
])
|
||||
```
|
||||
|
||||
## Key Properties
|
||||
|
||||
- Each subagent gets its **own terminal session** (separate from the parent)
|
||||
- **Depth limit of 2** — no grandchildren
|
||||
- Subagents **cannot** call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code`
|
||||
- **Interrupt propagation** — interrupting the parent interrupts all active children
|
||||
- Only the final summary enters the parent's context, keeping token usage efficient
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
delegation:
|
||||
max_iterations: 50 # Max turns per child (default: 50)
|
||||
default_toolsets: ["terminal", "file", "web"] # Default toolsets
|
||||
```
|
||||
|
||||
## When to Use Delegation
|
||||
|
||||
Delegation is most useful when:
|
||||
|
||||
- You have **independent workstreams** that can run in parallel
|
||||
- A subtask needs a **clean context** (e.g., debugging a long error trace without polluting the main conversation)
|
||||
- You want to **fan out** research across multiple topics and collect summaries
|
||||
|
||||
:::tip
|
||||
The agent handles delegation automatically based on the task complexity. You don't need to explicitly ask it to delegate — it will do so when it makes sense.
|
||||
:::
|
||||
182
website/docs/user-guide/features/hooks.md
Normal file
182
website/docs/user-guide/features/hooks.md
Normal file
|
|
@ -0,0 +1,182 @@
|
|||
---
|
||||
sidebar_position: 6
|
||||
title: "Event Hooks"
|
||||
description: "Run custom code at key lifecycle points — log activity, send alerts, post to webhooks"
|
||||
---
|
||||
|
||||
# Event Hooks
|
||||
|
||||
The hooks system lets you run custom code at key points in the agent lifecycle — session creation, slash commands, each tool-calling step, and more. Hooks fire automatically during gateway operation without blocking the main agent pipeline.
|
||||
|
||||
## Creating a Hook
|
||||
|
||||
Each hook is a directory under `~/.hermes/hooks/` containing two files:
|
||||
|
||||
```
|
||||
~/.hermes/hooks/
|
||||
└── my-hook/
|
||||
├── HOOK.yaml # Declares which events to listen for
|
||||
└── handler.py # Python handler function
|
||||
```
|
||||
|
||||
### HOOK.yaml
|
||||
|
||||
```yaml
|
||||
name: my-hook
|
||||
description: Log all agent activity to a file
|
||||
events:
|
||||
- agent:start
|
||||
- agent:end
|
||||
- agent:step
|
||||
```
|
||||
|
||||
The `events` list determines which events trigger your handler. You can subscribe to any combination of events, including wildcards like `command:*`.
|
||||
|
||||
### handler.py
|
||||
|
||||
```python
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
LOG_FILE = Path.home() / ".hermes" / "hooks" / "my-hook" / "activity.log"
|
||||
|
||||
async def handle(event_type: str, context: dict):
|
||||
"""Called for each subscribed event. Must be named 'handle'."""
|
||||
entry = {
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"event": event_type,
|
||||
**context,
|
||||
}
|
||||
with open(LOG_FILE, "a") as f:
|
||||
f.write(json.dumps(entry) + "\n")
|
||||
```
|
||||
|
||||
**Handler rules:**
|
||||
- Must be named `handle`
|
||||
- Receives `event_type` (string) and `context` (dict)
|
||||
- Can be `async def` or regular `def` — both work
|
||||
- Errors are caught and logged, never crashing the agent
|
||||
|
||||
## Available Events
|
||||
|
||||
| Event | When it fires | Context keys |
|
||||
|-------|---------------|--------------|
|
||||
| `gateway:startup` | Gateway process starts | `platforms` (list of active platform names) |
|
||||
| `session:start` | New messaging session created | `platform`, `user_id`, `session_id`, `session_key` |
|
||||
| `session:reset` | User ran `/new` or `/reset` | `platform`, `user_id`, `session_key` |
|
||||
| `agent:start` | Agent begins processing a message | `platform`, `user_id`, `session_id`, `message` |
|
||||
| `agent:step` | Each iteration of the tool-calling loop | `platform`, `user_id`, `session_id`, `iteration`, `tool_names` |
|
||||
| `agent:end` | Agent finishes processing | `platform`, `user_id`, `session_id`, `message`, `response` |
|
||||
| `command:*` | Any slash command executed | `platform`, `user_id`, `command`, `args` |
|
||||
|
||||
### Wildcard Matching
|
||||
|
||||
Handlers registered for `command:*` fire for any `command:` event (`command:model`, `command:reset`, etc.). Monitor all slash commands with a single subscription.
|
||||
|
||||
## Examples
|
||||
|
||||
### Telegram Alert on Long Tasks
|
||||
|
||||
Send yourself a message when the agent takes more than 10 steps:
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/hooks/long-task-alert/HOOK.yaml
|
||||
name: long-task-alert
|
||||
description: Alert when agent is taking many steps
|
||||
events:
|
||||
- agent:step
|
||||
```
|
||||
|
||||
```python
|
||||
# ~/.hermes/hooks/long-task-alert/handler.py
|
||||
import os
|
||||
import httpx
|
||||
|
||||
THRESHOLD = 10
|
||||
BOT_TOKEN = os.getenv("TELEGRAM_BOT_TOKEN")
|
||||
CHAT_ID = os.getenv("TELEGRAM_HOME_CHANNEL")
|
||||
|
||||
async def handle(event_type: str, context: dict):
|
||||
iteration = context.get("iteration", 0)
|
||||
if iteration == THRESHOLD and BOT_TOKEN and CHAT_ID:
|
||||
tools = ", ".join(context.get("tool_names", []))
|
||||
text = f"⚠️ Agent has been running for {iteration} steps. Last tools: {tools}"
|
||||
async with httpx.AsyncClient() as client:
|
||||
await client.post(
|
||||
f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage",
|
||||
json={"chat_id": CHAT_ID, "text": text},
|
||||
)
|
||||
```
|
||||
|
||||
### Command Usage Logger
|
||||
|
||||
Track which slash commands are used:
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/hooks/command-logger/HOOK.yaml
|
||||
name: command-logger
|
||||
description: Log slash command usage
|
||||
events:
|
||||
- command:*
|
||||
```
|
||||
|
||||
```python
|
||||
# ~/.hermes/hooks/command-logger/handler.py
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
LOG = Path.home() / ".hermes" / "logs" / "command_usage.jsonl"
|
||||
|
||||
def handle(event_type: str, context: dict):
|
||||
LOG.parent.mkdir(parents=True, exist_ok=True)
|
||||
entry = {
|
||||
"ts": datetime.now().isoformat(),
|
||||
"command": context.get("command"),
|
||||
"args": context.get("args"),
|
||||
"platform": context.get("platform"),
|
||||
"user": context.get("user_id"),
|
||||
}
|
||||
with open(LOG, "a") as f:
|
||||
f.write(json.dumps(entry) + "\n")
|
||||
```
|
||||
|
||||
### Session Start Webhook
|
||||
|
||||
POST to an external service on new sessions:
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/hooks/session-webhook/HOOK.yaml
|
||||
name: session-webhook
|
||||
description: Notify external service on new sessions
|
||||
events:
|
||||
- session:start
|
||||
- session:reset
|
||||
```
|
||||
|
||||
```python
|
||||
# ~/.hermes/hooks/session-webhook/handler.py
|
||||
import httpx
|
||||
|
||||
WEBHOOK_URL = "https://your-service.example.com/hermes-events"
|
||||
|
||||
async def handle(event_type: str, context: dict):
|
||||
async with httpx.AsyncClient() as client:
|
||||
await client.post(WEBHOOK_URL, json={
|
||||
"event": event_type,
|
||||
**context,
|
||||
}, timeout=5)
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. On gateway startup, `HookRegistry.discover_and_load()` scans `~/.hermes/hooks/`
|
||||
2. Each subdirectory with `HOOK.yaml` + `handler.py` is loaded dynamically
|
||||
3. Handlers are registered for their declared events
|
||||
4. At each lifecycle point, `hooks.emit()` fires all matching handlers
|
||||
5. Errors in any handler are caught and logged — a broken hook never crashes the agent
|
||||
|
||||
:::info
|
||||
Hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not currently load hooks.
|
||||
:::
|
||||
269
website/docs/user-guide/features/mcp.md
Normal file
269
website/docs/user-guide/features/mcp.md
Normal file
|
|
@ -0,0 +1,269 @@
|
|||
---
|
||||
sidebar_position: 4
|
||||
title: "MCP (Model Context Protocol)"
|
||||
description: "Connect Hermes Agent to external tool servers via MCP — databases, APIs, filesystems, and more"
|
||||
---
|
||||
|
||||
# MCP (Model Context Protocol)
|
||||
|
||||
MCP lets Hermes Agent connect to external tool servers — giving the agent access to databases, APIs, filesystems, and more without any code changes.
|
||||
|
||||
## Overview
|
||||
|
||||
The [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) is an open standard for connecting AI agents to external tools and data sources. MCP servers expose tools over a lightweight RPC protocol, and Hermes Agent can connect to any compliant server automatically.
|
||||
|
||||
What this means for you:
|
||||
|
||||
- **Thousands of ready-made tools** — browse the [MCP server directory](https://github.com/modelcontextprotocol/servers) for servers covering GitHub, Slack, databases, file systems, web scraping, and more
|
||||
- **No code changes needed** — add a few lines to `~/.hermes/config.yaml` and the tools appear alongside built-in ones
|
||||
- **Mix and match** — run multiple MCP servers simultaneously, combining stdio-based and HTTP-based servers
|
||||
- **Secure by default** — environment variables are filtered and credentials are stripped from error messages
|
||||
|
||||
## Prerequisites
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[mcp]
|
||||
```
|
||||
|
||||
| Server Type | Runtime Needed | Example |
|
||||
|-------------|---------------|---------|
|
||||
| HTTP/remote | Nothing extra | `url: "https://mcp.example.com"` |
|
||||
| npm-based (npx) | Node.js 18+ | `command: "npx"` |
|
||||
| Python-based | uv (recommended) | `command: "uvx"` |
|
||||
|
||||
## Configuration
|
||||
|
||||
MCP servers are configured in `~/.hermes/config.yaml` under the `mcp_servers` key.
|
||||
|
||||
### Stdio Servers
|
||||
|
||||
Stdio servers run as local subprocesses, communicating over stdin/stdout:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
filesystem:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
|
||||
env: {}
|
||||
|
||||
github:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-github"]
|
||||
env:
|
||||
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"
|
||||
```
|
||||
|
||||
| Key | Required | Description |
|
||||
|-----|----------|-------------|
|
||||
| `command` | Yes | Executable to run (`npx`, `uvx`, `python`) |
|
||||
| `args` | No | Command-line arguments |
|
||||
| `env` | No | Environment variables for the subprocess |
|
||||
|
||||
:::info Security
|
||||
Only explicitly listed `env` variables plus a safe baseline (`PATH`, `HOME`, `USER`, `LANG`, `SHELL`, `TMPDIR`, `XDG_*`) are passed to the subprocess. Your API keys and secrets are **not** leaked.
|
||||
:::
|
||||
|
||||
### HTTP Servers
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
remote_api:
|
||||
url: "https://my-mcp-server.example.com/mcp"
|
||||
headers:
|
||||
Authorization: "Bearer sk-xxxxxxxxxxxx"
|
||||
```
|
||||
|
||||
### Per-Server Timeouts
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
slow_database:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-postgres"]
|
||||
env:
|
||||
DATABASE_URL: "postgres://user:pass@localhost/mydb"
|
||||
timeout: 300 # Tool call timeout (default: 120s)
|
||||
connect_timeout: 90 # Initial connection timeout (default: 60s)
|
||||
```
|
||||
|
||||
### Mixed Configuration Example
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
# Local filesystem via stdio
|
||||
filesystem:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
|
||||
|
||||
# GitHub API via stdio with auth
|
||||
github:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-github"]
|
||||
env:
|
||||
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"
|
||||
|
||||
# Remote database via HTTP
|
||||
company_db:
|
||||
url: "https://mcp.internal.company.com/db"
|
||||
headers:
|
||||
Authorization: "Bearer sk-xxxxxxxxxxxx"
|
||||
timeout: 180
|
||||
|
||||
# Python-based server via uvx
|
||||
memory:
|
||||
command: "uvx"
|
||||
args: ["mcp-server-memory"]
|
||||
```
|
||||
|
||||
## Translating from Claude Desktop Config
|
||||
|
||||
Many MCP server docs show Claude Desktop JSON format. Here's the translation:
|
||||
|
||||
**Claude Desktop JSON:**
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"filesystem": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Hermes YAML:**
|
||||
```yaml
|
||||
mcp_servers: # mcpServers → mcp_servers (snake_case)
|
||||
filesystem:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
|
||||
```
|
||||
|
||||
Rules: `mcpServers` → `mcp_servers` (snake_case), JSON → YAML. Keys like `command`, `args`, `env` are identical.
|
||||
|
||||
## How It Works
|
||||
|
||||
### Tool Registration
|
||||
|
||||
Each MCP tool is registered with a prefixed name:
|
||||
|
||||
```
|
||||
mcp_{server_name}_{tool_name}
|
||||
```
|
||||
|
||||
| Server Name | MCP Tool Name | Registered As |
|
||||
|-------------|--------------|---------------|
|
||||
| `filesystem` | `read_file` | `mcp_filesystem_read_file` |
|
||||
| `github` | `create-issue` | `mcp_github_create_issue` |
|
||||
| `my-api` | `query.data` | `mcp_my_api_query_data` |
|
||||
|
||||
Tools appear alongside built-in tools — the agent calls them like any other tool.
|
||||
|
||||
### Reconnection
|
||||
|
||||
If an MCP server disconnects, Hermes automatically reconnects with exponential backoff (1s, 2s, 4s, 8s, 16s — max 5 attempts). Initial connection failures are reported immediately.
|
||||
|
||||
### Shutdown
|
||||
|
||||
On agent exit, all MCP server connections are cleanly shut down.
|
||||
|
||||
## Popular MCP Servers
|
||||
|
||||
| Server | Package | Description |
|
||||
|--------|---------|-------------|
|
||||
| Filesystem | `@modelcontextprotocol/server-filesystem` | Read/write/search local files |
|
||||
| GitHub | `@modelcontextprotocol/server-github` | Issues, PRs, repos, code search |
|
||||
| Git | `@modelcontextprotocol/server-git` | Git operations on local repos |
|
||||
| Fetch | `@modelcontextprotocol/server-fetch` | HTTP fetching and web content |
|
||||
| Memory | `@modelcontextprotocol/server-memory` | Persistent key-value memory |
|
||||
| SQLite | `@modelcontextprotocol/server-sqlite` | Query SQLite databases |
|
||||
| PostgreSQL | `@modelcontextprotocol/server-postgres` | Query PostgreSQL databases |
|
||||
| Brave Search | `@modelcontextprotocol/server-brave-search` | Web search via Brave API |
|
||||
| Puppeteer | `@modelcontextprotocol/server-puppeteer` | Browser automation |
|
||||
|
||||
### Example Configs
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
# No API key needed
|
||||
filesystem:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
|
||||
|
||||
git:
|
||||
command: "uvx"
|
||||
args: ["mcp-server-git", "--repository", "/home/user/my-repo"]
|
||||
|
||||
fetch:
|
||||
command: "uvx"
|
||||
args: ["mcp-server-fetch"]
|
||||
|
||||
sqlite:
|
||||
command: "uvx"
|
||||
args: ["mcp-server-sqlite", "--db-path", "/home/user/data.db"]
|
||||
|
||||
# Requires API key
|
||||
github:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-github"]
|
||||
env:
|
||||
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"
|
||||
|
||||
brave_search:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-brave-search"]
|
||||
env:
|
||||
BRAVE_API_KEY: "BSA_xxxxxxxxxxxx"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "MCP SDK not available"
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[mcp]
|
||||
```
|
||||
|
||||
### Server fails to start
|
||||
|
||||
The MCP server command (`npx`, `uvx`) is not on PATH. Install the required runtime:
|
||||
|
||||
```bash
|
||||
# For npm-based servers
|
||||
npm install -g npx # or ensure Node.js 18+ is installed
|
||||
|
||||
# For Python-based servers
|
||||
pip install uv # then use "uvx" as the command
|
||||
```
|
||||
|
||||
### Server connects but tools fail with auth errors
|
||||
|
||||
Ensure the key is in the server's `env` block:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
github:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-github"]
|
||||
env:
|
||||
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_your_actual_token" # Check this
|
||||
```
|
||||
|
||||
### Connection timeout
|
||||
|
||||
Increase `connect_timeout` for slow-starting servers:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
slow_server:
|
||||
command: "npx"
|
||||
args: ["-y", "heavy-server-package"]
|
||||
connect_timeout: 120 # default is 60
|
||||
```
|
||||
|
||||
### Reload MCP Servers
|
||||
|
||||
You can reload MCP servers without restarting Hermes:
|
||||
|
||||
- In the CLI: the agent reconnects automatically
|
||||
- In messaging: send `/reload-mcp`
|
||||
97
website/docs/user-guide/features/memory.md
Normal file
97
website/docs/user-guide/features/memory.md
Normal file
|
|
@ -0,0 +1,97 @@
|
|||
---
|
||||
sidebar_position: 3
|
||||
title: "Persistent Memory"
|
||||
description: "How Hermes Agent remembers across sessions — MEMORY.md, USER.md, and session search"
|
||||
---
|
||||
|
||||
# Persistent Memory
|
||||
|
||||
Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it has learned.
|
||||
|
||||
## How It Works
|
||||
|
||||
Two files make up the agent's memory:
|
||||
|
||||
| File | Purpose | Token Budget |
|
||||
|------|---------|-------------|
|
||||
| **MEMORY.md** | Agent's personal notes — environment facts, conventions, things learned | ~800 tokens (~2200 chars) |
|
||||
| **USER.md** | User profile — your preferences, communication style, expectations | ~500 tokens (~1375 chars) |
|
||||
|
||||
Both are stored in `~/.hermes/memories/` and are injected into the system prompt as a frozen snapshot at session start. The agent manages its own memory via the `memory` tool — it can add, replace, or remove entries.
|
||||
|
||||
:::info
|
||||
Character limits keep memory focused. When memory is full, the agent consolidates or replaces entries to make room for new information.
|
||||
:::
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
memory:
|
||||
memory_enabled: true
|
||||
user_profile_enabled: true
|
||||
memory_char_limit: 2200 # ~800 tokens
|
||||
user_char_limit: 1375 # ~500 tokens
|
||||
```
|
||||
|
||||
## Memory Tool Actions
|
||||
|
||||
The agent uses the `memory` tool with these actions:
|
||||
|
||||
- **add** — Add a new memory entry
|
||||
- **replace** — Replace an existing entry with updated content
|
||||
- **remove** — Remove an entry that's no longer relevant
|
||||
- **read** — Read current memory contents
|
||||
|
||||
## Session Search
|
||||
|
||||
Beyond MEMORY.md and USER.md, the agent can search its past conversations using the `session_search` tool:
|
||||
|
||||
- All CLI and messaging sessions are stored in SQLite (`~/.hermes/state.db`) with FTS5 full-text search
|
||||
- Search queries return relevant past conversations with Gemini Flash summarization
|
||||
- The agent can find things it discussed weeks ago, even if they're not in its active memory
|
||||
|
||||
```bash
|
||||
hermes sessions list # Browse past sessions
|
||||
```
|
||||
|
||||
## Honcho Integration (Cross-Session User Modeling)
|
||||
|
||||
For deeper, AI-generated user understanding that works across tools, you can optionally enable [Honcho](https://honcho.dev/) by Plastic Labs. Honcho runs alongside existing memory — USER.md stays as-is, and Honcho adds an additional layer of context.
|
||||
|
||||
When enabled:
|
||||
- **Prefetch**: Each turn, Honcho's user representation is injected into the system prompt
|
||||
- **Sync**: After each conversation, messages are synced to Honcho
|
||||
- **Query tool**: The agent can actively query its understanding of you via `query_user_context`
|
||||
|
||||
**Setup:**
|
||||
|
||||
```bash
|
||||
# 1. Install the optional dependency
|
||||
uv pip install honcho-ai
|
||||
|
||||
# 2. Get an API key from https://app.honcho.dev
|
||||
|
||||
# 3. Create ~/.honcho/config.json
|
||||
cat > ~/.honcho/config.json << 'EOF'
|
||||
{
|
||||
"enabled": true,
|
||||
"apiKey": "your-honcho-api-key",
|
||||
"peerName": "your-name",
|
||||
"hosts": {
|
||||
"hermes": {
|
||||
"workspace": "hermes"
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
Or via environment variable:
|
||||
```bash
|
||||
hermes config set HONCHO_API_KEY your-key
|
||||
```
|
||||
|
||||
:::tip
|
||||
Honcho is fully opt-in — zero behavior change when disabled or unconfigured. All Honcho calls are non-fatal; if the service is unreachable, the agent continues normally.
|
||||
:::
|
||||
159
website/docs/user-guide/features/skills.md
Normal file
159
website/docs/user-guide/features/skills.md
Normal file
|
|
@ -0,0 +1,159 @@
|
|||
---
|
||||
sidebar_position: 2
|
||||
title: "Skills System"
|
||||
description: "On-demand knowledge documents — progressive disclosure, agent-managed skills, and the Skills Hub"
|
||||
---
|
||||
|
||||
# Skills System
|
||||
|
||||
Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure** pattern to minimize token usage and are compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
|
||||
|
||||
All skills live in **`~/.hermes/skills/`** — a single directory that serves as the source of truth. On fresh install, bundled skills are copied from the repo. Hub-installed and agent-created skills also go here. The agent can modify or delete any skill.
|
||||
|
||||
## Using Skills
|
||||
|
||||
Every installed skill is automatically available as a slash command:
|
||||
|
||||
```bash
|
||||
# In the CLI or any messaging platform:
|
||||
/gif-search funny cats
|
||||
/axolotl help me fine-tune Llama 3 on my dataset
|
||||
/github-pr-workflow create a PR for the auth refactor
|
||||
|
||||
# Just the skill name loads it and lets the agent ask what you need:
|
||||
/excalidraw
|
||||
```
|
||||
|
||||
You can also interact with skills through natural conversation:
|
||||
|
||||
```bash
|
||||
hermes --toolsets skills -q "What skills do you have?"
|
||||
hermes --toolsets skills -q "Show me the axolotl skill"
|
||||
```
|
||||
|
||||
## Progressive Disclosure
|
||||
|
||||
Skills use a token-efficient loading pattern:
|
||||
|
||||
```
|
||||
Level 0: skills_categories() → ["mlops", "devops"] (~50 tokens)
|
||||
Level 1: skills_list(category) → [{name, description}, ...] (~3k tokens)
|
||||
Level 2: skill_view(name) → Full content + metadata (varies)
|
||||
Level 3: skill_view(name, path) → Specific reference file (varies)
|
||||
```
|
||||
|
||||
The agent only loads the full skill content when it actually needs it.
|
||||
|
||||
## SKILL.md Format
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: my-skill
|
||||
description: Brief description of what this skill does
|
||||
version: 1.0.0
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [python, automation]
|
||||
category: devops
|
||||
---
|
||||
|
||||
# Skill Title
|
||||
|
||||
## When to Use
|
||||
Trigger conditions for this skill.
|
||||
|
||||
## Procedure
|
||||
1. Step one
|
||||
2. Step two
|
||||
|
||||
## Pitfalls
|
||||
- Known failure modes and fixes
|
||||
|
||||
## Verification
|
||||
How to confirm it worked.
|
||||
```
|
||||
|
||||
## Skill Directory Structure
|
||||
|
||||
```
|
||||
~/.hermes/skills/ # Single source of truth
|
||||
├── mlops/ # Category directory
|
||||
│ ├── axolotl/
|
||||
│ │ ├── SKILL.md # Main instructions (required)
|
||||
│ │ ├── references/ # Additional docs
|
||||
│ │ ├── templates/ # Output formats
|
||||
│ │ └── assets/ # Supplementary files
|
||||
│ └── vllm/
|
||||
│ └── SKILL.md
|
||||
├── devops/
|
||||
│ └── deploy-k8s/ # Agent-created skill
|
||||
│ ├── SKILL.md
|
||||
│ └── references/
|
||||
├── .hub/ # Skills Hub state
|
||||
│ ├── lock.json
|
||||
│ ├── quarantine/
|
||||
│ └── audit.log
|
||||
└── .bundled_manifest # Tracks seeded bundled skills
|
||||
```
|
||||
|
||||
## Agent-Managed Skills (skill_manage tool)
|
||||
|
||||
The agent can create, update, and delete its own skills via the `skill_manage` tool. This is the agent's **procedural memory** — when it figures out a non-trivial workflow, it saves the approach as a skill for future reuse.
|
||||
|
||||
### When the Agent Creates Skills
|
||||
|
||||
- After completing a complex task (5+ tool calls) successfully
|
||||
- When it hit errors or dead ends and found the working path
|
||||
- When the user corrected its approach
|
||||
- When it discovered a non-trivial workflow
|
||||
|
||||
### Actions
|
||||
|
||||
| Action | Use for | Key params |
|
||||
|--------|---------|------------|
|
||||
| `create` | New skill from scratch | `name`, `content` (full SKILL.md), optional `category` |
|
||||
| `patch` | Targeted fixes (preferred) | `name`, `old_string`, `new_string` |
|
||||
| `edit` | Major structural rewrites | `name`, `content` (full SKILL.md replacement) |
|
||||
| `delete` | Remove a skill entirely | `name` |
|
||||
| `write_file` | Add/update supporting files | `name`, `file_path`, `file_content` |
|
||||
| `remove_file` | Remove a supporting file | `name`, `file_path` |
|
||||
|
||||
:::tip
|
||||
The `patch` action is preferred for updates — it's more token-efficient than `edit` because only the changed text appears in the tool call.
|
||||
:::
|
||||
|
||||
## Skills Hub
|
||||
|
||||
Search, install, and manage skills from online registries:
|
||||
|
||||
```bash
|
||||
hermes skills search kubernetes # Search all sources
|
||||
hermes skills install openai/skills/k8s # Install with security scan
|
||||
hermes skills inspect openai/skills/k8s # Preview before installing
|
||||
hermes skills list --source hub # List hub-installed skills
|
||||
hermes skills audit # Re-scan all hub skills
|
||||
hermes skills uninstall k8s # Remove a hub skill
|
||||
hermes skills publish skills/my-skill --to github --repo owner/repo
|
||||
hermes skills snapshot export setup.json # Export skill config
|
||||
hermes skills tap add myorg/skills-repo # Add a custom source
|
||||
```
|
||||
|
||||
All hub-installed skills go through a **security scanner** that checks for data exfiltration, prompt injection, destructive commands, and other threats.
|
||||
|
||||
### Trust Levels
|
||||
|
||||
| Level | Source | Policy |
|
||||
|-------|--------|--------|
|
||||
| `builtin` | Ships with Hermes | Always trusted |
|
||||
| `trusted` | openai/skills, anthropics/skills | Trusted sources |
|
||||
| `community` | Everything else | Any findings = blocked unless `--force` |
|
||||
|
||||
### Slash Commands (Inside Chat)
|
||||
|
||||
All the same commands work with `/skills` prefix:
|
||||
|
||||
```
|
||||
/skills search kubernetes
|
||||
/skills install openai/skills/skill-creator
|
||||
/skills list
|
||||
```
|
||||
163
website/docs/user-guide/features/tools.md
Normal file
163
website/docs/user-guide/features/tools.md
Normal file
|
|
@ -0,0 +1,163 @@
|
|||
---
|
||||
sidebar_position: 1
|
||||
title: "Tools & Toolsets"
|
||||
description: "Overview of Hermes Agent's tools — what's available, how toolsets work, and terminal backends"
|
||||
---
|
||||
|
||||
# Tools & Toolsets
|
||||
|
||||
Tools are functions that extend the agent's capabilities. They're organized into logical **toolsets** that can be enabled or disabled per platform.
|
||||
|
||||
## Available Tools
|
||||
|
||||
| Category | Tools | Description |
|
||||
|----------|-------|-------------|
|
||||
| **Web** | `web_search`, `web_extract`, `web_crawl` | Search the web, extract page content, crawl sites |
|
||||
| **Terminal** | `terminal` | Execute commands (local/docker/singularity/modal/ssh backends) |
|
||||
| **File** | `read_file`, `write_file`, `patch`, `search` | Read, write, edit, and search files |
|
||||
| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, etc. | Full browser automation via Browserbase |
|
||||
| **Vision** | `vision_analyze` | Image analysis via multimodal models |
|
||||
| **Image Gen** | `image_generate` | Generate images (FLUX via FAL) |
|
||||
| **TTS** | `text_to_speech` | Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |
|
||||
| **Reasoning** | `mixture_of_agents` | Multi-model reasoning |
|
||||
| **Skills** | `skills_list`, `skill_view`, `skill_manage` | Find, view, create, and manage skills |
|
||||
| **Todo** | `todo` | Read/write task list for multi-step planning |
|
||||
| **Memory** | `memory` | Persistent notes + user profile across sessions |
|
||||
| **Session Search** | `session_search` | Search + summarize past conversations (FTS5) |
|
||||
| **Cronjob** | `schedule_cronjob`, `list_cronjobs`, `remove_cronjob` | Scheduled task management |
|
||||
| **Code Execution** | `execute_code` | Run Python scripts that call tools via RPC sandbox |
|
||||
| **Delegation** | `delegate_task` | Spawn subagents with isolated context |
|
||||
| **Clarify** | `clarify` | Ask the user multiple-choice or open-ended questions (CLI-only) |
|
||||
| **MCP** | Auto-discovered | External tools from MCP servers |
|
||||
|
||||
## Using Toolsets
|
||||
|
||||
```bash
|
||||
# Use specific toolsets
|
||||
hermes --toolsets "web,terminal"
|
||||
|
||||
# See all available tools
|
||||
hermes tools
|
||||
|
||||
# Configure tools per platform (interactive)
|
||||
hermes tools
|
||||
```
|
||||
|
||||
**Available toolsets:** `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, `code_execution`, `delegation`, `clarify`, and more.
|
||||
|
||||
## Terminal Backends
|
||||
|
||||
The terminal tool can execute commands in different environments:
|
||||
|
||||
| Backend | Description | Use Case |
|
||||
|---------|-------------|----------|
|
||||
| `local` | Run on your machine (default) | Development, trusted tasks |
|
||||
| `docker` | Isolated containers | Security, reproducibility |
|
||||
| `ssh` | Remote server | Sandboxing, keep agent away from its own code |
|
||||
| `singularity` | HPC containers | Cluster computing, rootless |
|
||||
| `modal` | Cloud execution | Serverless, scale |
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
terminal:
|
||||
backend: local # or: docker, ssh, singularity, modal
|
||||
cwd: "." # Working directory
|
||||
timeout: 180 # Command timeout in seconds
|
||||
```
|
||||
|
||||
### Docker Backend
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
backend: docker
|
||||
docker_image: python:3.11-slim
|
||||
```
|
||||
|
||||
### SSH Backend
|
||||
|
||||
Recommended for security — agent can't modify its own code:
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
backend: ssh
|
||||
```
|
||||
```bash
|
||||
# Set credentials in ~/.hermes/.env
|
||||
TERMINAL_SSH_HOST=my-server.example.com
|
||||
TERMINAL_SSH_USER=myuser
|
||||
TERMINAL_SSH_KEY=~/.ssh/id_rsa
|
||||
```
|
||||
|
||||
### Singularity/Apptainer
|
||||
|
||||
```bash
|
||||
# Pre-build SIF for parallel workers
|
||||
apptainer build ~/python.sif docker://python:3.11-slim
|
||||
|
||||
# Configure
|
||||
hermes config set terminal.backend singularity
|
||||
hermes config set terminal.singularity_image ~/python.sif
|
||||
```
|
||||
|
||||
### Modal (Serverless Cloud)
|
||||
|
||||
```bash
|
||||
uv pip install "swe-rex[modal]"
|
||||
modal setup
|
||||
hermes config set terminal.backend modal
|
||||
```
|
||||
|
||||
### Container Resources
|
||||
|
||||
Configure CPU, memory, disk, and persistence for all container backends:
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
backend: docker # or singularity, modal
|
||||
container_cpu: 1 # CPU cores (default: 1)
|
||||
container_memory: 5120 # Memory in MB (default: 5GB)
|
||||
container_disk: 51200 # Disk in MB (default: 50GB)
|
||||
container_persistent: true # Persist filesystem across sessions (default: true)
|
||||
```
|
||||
|
||||
When `container_persistent: true`, installed packages, files, and config survive across sessions.
|
||||
|
||||
### Container Security
|
||||
|
||||
All container backends run with security hardening:
|
||||
|
||||
- Read-only root filesystem (Docker)
|
||||
- All Linux capabilities dropped
|
||||
- No privilege escalation
|
||||
- PID limits (256 processes)
|
||||
- Full namespace isolation
|
||||
- Persistent workspace via volumes, not writable root layer
|
||||
|
||||
## Background Process Management
|
||||
|
||||
Start background processes and manage them:
|
||||
|
||||
```python
|
||||
terminal(command="pytest -v tests/", background=true)
|
||||
# Returns: {"session_id": "proc_abc123", "pid": 12345}
|
||||
|
||||
# Then manage with the process tool:
|
||||
process(action="list") # Show all running processes
|
||||
process(action="poll", session_id="proc_abc123") # Check status
|
||||
process(action="wait", session_id="proc_abc123") # Block until done
|
||||
process(action="log", session_id="proc_abc123") # Full output
|
||||
process(action="kill", session_id="proc_abc123") # Terminate
|
||||
process(action="write", session_id="proc_abc123", data="y") # Send input
|
||||
```
|
||||
|
||||
PTY mode (`pty=true`) enables interactive CLI tools like Codex and Claude Code.
|
||||
|
||||
## Sudo Support
|
||||
|
||||
If a command needs sudo, you'll be prompted for your password (cached for the session). Or set `SUDO_PASSWORD` in `~/.hermes/.env`.
|
||||
|
||||
:::warning
|
||||
On messaging platforms, if sudo fails, the output includes a tip to add `SUDO_PASSWORD` to `~/.hermes/.env`.
|
||||
:::
|
||||
89
website/docs/user-guide/features/tts.md
Normal file
89
website/docs/user-guide/features/tts.md
Normal file
|
|
@ -0,0 +1,89 @@
|
|||
---
|
||||
sidebar_position: 9
|
||||
title: "Voice & TTS"
|
||||
description: "Text-to-speech and voice message transcription across all platforms"
|
||||
---
|
||||
|
||||
# Voice & TTS
|
||||
|
||||
Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.
|
||||
|
||||
## Text-to-Speech
|
||||
|
||||
Convert text to speech with three providers:
|
||||
|
||||
| Provider | Quality | Cost | API Key |
|
||||
|----------|---------|------|---------|
|
||||
| **Edge TTS** (default) | Good | Free | None needed |
|
||||
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
|
||||
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
|
||||
|
||||
### Platform Delivery
|
||||
|
||||
| Platform | Delivery | Format |
|
||||
|----------|----------|--------|
|
||||
| Telegram | Voice bubble (plays inline) | Opus `.ogg` |
|
||||
| Discord | Audio file attachment | MP3 |
|
||||
| WhatsApp | Audio file attachment | MP3 |
|
||||
| CLI | Saved to `~/voice-memos/` | MP3 |
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
tts:
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai"
|
||||
edge:
|
||||
voice: "en-US-AriaNeural" # 322 voices, 74 languages
|
||||
elevenlabs:
|
||||
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
|
||||
model_id: "eleven_multilingual_v2"
|
||||
openai:
|
||||
model: "gpt-4o-mini-tts"
|
||||
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
|
||||
```
|
||||
|
||||
### Telegram Voice Bubbles & ffmpeg
|
||||
|
||||
Telegram voice bubbles require Opus/OGG audio format:
|
||||
|
||||
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
|
||||
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt install ffmpeg
|
||||
|
||||
# macOS
|
||||
brew install ffmpeg
|
||||
|
||||
# Fedora
|
||||
sudo dnf install ffmpeg
|
||||
```
|
||||
|
||||
Without ffmpeg, Edge TTS audio is sent as a regular audio file (playable, but shows as a rectangular player instead of a voice bubble).
|
||||
|
||||
:::tip
|
||||
If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.
|
||||
:::
|
||||
|
||||
## Voice Message Transcription
|
||||
|
||||
Voice messages sent on Telegram, Discord, WhatsApp, or Slack are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.
|
||||
|
||||
| Provider | Model | Quality | Cost |
|
||||
|----------|-------|---------|------|
|
||||
| **OpenAI Whisper** | `whisper-1` (default) | Good | Low |
|
||||
| **OpenAI GPT-4o** | `gpt-4o-mini-transcribe` | Better | Medium |
|
||||
| **OpenAI GPT-4o** | `gpt-4o-transcribe` | Best | Higher |
|
||||
|
||||
Requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
stt:
|
||||
enabled: true
|
||||
model: "whisper-1"
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue