feat: add documentation website (Docusaurus)

- 25 documentation pages covering Getting Started, User Guide, Developer Guide, and Reference
- Docusaurus with custom amber/gold theme matching the landing page branding
- GitHub Actions workflow to deploy landing page + docs to GitHub Pages
- Landing page at root, docs at /docs/ on hermes-agent.nousresearch.com
- Content extracted and restructured from existing repo docs (README, AGENTS.md, CONTRIBUTING.md, docs/)
- Auto-deploy on push to main when website/ or landingpage/ changes
This commit is contained in:
teknium1 2026-03-05 05:24:55 -08:00
parent 1708dcd2b2
commit ada3713e77
45 changed files with 22822 additions and 0 deletions

View file

@ -0,0 +1,8 @@
{
"label": "Features",
"position": 4,
"link": {
"type": "generated-index",
"description": "Explore the powerful features of Hermes Agent."
}
}

View file

@ -0,0 +1,51 @@
---
sidebar_position: 8
title: "Code Execution"
description: "Sandboxed Python execution with RPC tool access — collapse multi-step workflows into a single turn"
---
# Code Execution (Programmatic Tool Calling)
The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a sandboxed child process on the agent host, communicating via Unix domain socket RPC.
## How It Works
```python
# The agent can write scripts like:
from hermes_tools import web_search, web_extract
results = web_search("Python 3.13 features", limit=5)
for r in results["data"]["web"]:
content = web_extract([r["url"]])
# ... filter and process ...
print(summary)
```
**Available tools in sandbox:** `web_search`, `web_extract`, `read_file`, `write_file`, `search`, `patch`, `terminal` (foreground only).
## When the Agent Uses This
The agent uses `execute_code` when there are:
- **3+ tool calls** with processing logic between them
- Bulk data filtering or conditional branching
- Loops over results
The key benefit: intermediate tool results never enter the context window — only the final `print()` output comes back, dramatically reducing token usage.
## Security
:::danger Security Model
The child process runs with a **minimal environment**. API keys, tokens, and credentials are stripped entirely. The script accesses tools exclusively via the RPC channel — it cannot read secrets from environment variables.
:::
Only safe system variables (`PATH`, `HOME`, `LANG`, etc.) are passed through.
## Configuration
```yaml
# In ~/.hermes/config.yaml
code_execution:
timeout: 300 # Max seconds per script (default: 300)
max_tool_calls: 50 # Max tool calls per execution (default: 50)
```

View file

@ -0,0 +1,87 @@
---
sidebar_position: 5
title: "Scheduled Tasks (Cron)"
description: "Schedule automated tasks with natural language — cron jobs, delivery options, and the gateway scheduler"
---
# Scheduled Tasks (Cron)
Schedule tasks to run automatically with natural language or cron expressions. The agent can self-schedule using the `schedule_cronjob` tool from any platform.
## Creating Scheduled Tasks
### In the CLI
Use the `/cron` slash command:
```
/cron add 30m "Remind me to check the build"
/cron add "every 2h" "Check server status"
/cron add "0 9 * * *" "Morning briefing"
/cron list
/cron remove <job_id>
```
### Through Natural Conversation
Simply ask the agent on any platform:
```
Every morning at 9am, check Hacker News for AI news and send me a summary on Telegram.
```
The agent will use the `schedule_cronjob` tool to set it up.
## How It Works
**Cron execution is handled by the gateway daemon.** The gateway ticks the scheduler every 60 seconds, running any due jobs in isolated agent sessions:
```bash
hermes gateway install # Install as system service (recommended)
hermes gateway # Or run in foreground
hermes cron list # View scheduled jobs
hermes cron status # Check if gateway is running
```
:::info
Even if no messaging platforms are configured, the gateway stays running for cron. A file lock prevents duplicate execution if multiple processes overlap.
:::
## Delivery Options
When scheduling jobs, you specify where the output goes:
| Option | Description |
|--------|-------------|
| `"origin"` | Back to where the job was created |
| `"local"` | Save to local files only |
| `"telegram"` | Telegram home channel |
| `"discord"` | Discord home channel |
| `"telegram:123456"` | Specific Telegram chat |
The agent knows your connected platforms and home channels — it'll choose sensible defaults.
## Schedule Formats
- **Relative:** `30m`, `2h`, `1d`
- **Human-readable:** `"every 2 hours"`, `"daily at 9am"`
- **Cron expressions:** `"0 9 * * *"` (standard 5-field cron syntax)
## Managing Jobs
```bash
# CLI commands
hermes cron list # View all scheduled jobs
hermes cron status # Check if the scheduler is running
# Slash commands (inside chat)
/cron list
/cron remove <job_id>
```
## Security
:::warning
Scheduled task prompts are scanned for instruction-override patterns (prompt injection). Jobs with suspicious content are blocked.
:::

View file

@ -0,0 +1,60 @@
---
sidebar_position: 7
title: "Subagent Delegation"
description: "Spawn isolated child agents for parallel workstreams with delegate_task"
---
# Subagent Delegation
The `delegate_task` tool spawns child AIAgent instances with isolated context, restricted toolsets, and their own terminal sessions. Each child gets a fresh conversation and works independently — only its final summary enters the parent's context.
## Single Task
```python
delegate_task(
goal="Debug why tests fail",
context="Error: assertion in test_foo.py line 42",
toolsets=["terminal", "file"]
)
```
## Parallel Batch
Up to 3 concurrent subagents:
```python
delegate_task(tasks=[
{"goal": "Research topic A", "toolsets": ["web"]},
{"goal": "Research topic B", "toolsets": ["web"]},
{"goal": "Fix the build", "toolsets": ["terminal", "file"]}
])
```
## Key Properties
- Each subagent gets its **own terminal session** (separate from the parent)
- **Depth limit of 2** — no grandchildren
- Subagents **cannot** call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code`
- **Interrupt propagation** — interrupting the parent interrupts all active children
- Only the final summary enters the parent's context, keeping token usage efficient
## Configuration
```yaml
# In ~/.hermes/config.yaml
delegation:
max_iterations: 50 # Max turns per child (default: 50)
default_toolsets: ["terminal", "file", "web"] # Default toolsets
```
## When to Use Delegation
Delegation is most useful when:
- You have **independent workstreams** that can run in parallel
- A subtask needs a **clean context** (e.g., debugging a long error trace without polluting the main conversation)
- You want to **fan out** research across multiple topics and collect summaries
:::tip
The agent handles delegation automatically based on the task complexity. You don't need to explicitly ask it to delegate — it will do so when it makes sense.
:::

View file

@ -0,0 +1,182 @@
---
sidebar_position: 6
title: "Event Hooks"
description: "Run custom code at key lifecycle points — log activity, send alerts, post to webhooks"
---
# Event Hooks
The hooks system lets you run custom code at key points in the agent lifecycle — session creation, slash commands, each tool-calling step, and more. Hooks fire automatically during gateway operation without blocking the main agent pipeline.
## Creating a Hook
Each hook is a directory under `~/.hermes/hooks/` containing two files:
```
~/.hermes/hooks/
└── my-hook/
├── HOOK.yaml # Declares which events to listen for
└── handler.py # Python handler function
```
### HOOK.yaml
```yaml
name: my-hook
description: Log all agent activity to a file
events:
- agent:start
- agent:end
- agent:step
```
The `events` list determines which events trigger your handler. You can subscribe to any combination of events, including wildcards like `command:*`.
### handler.py
```python
import json
from datetime import datetime
from pathlib import Path
LOG_FILE = Path.home() / ".hermes" / "hooks" / "my-hook" / "activity.log"
async def handle(event_type: str, context: dict):
"""Called for each subscribed event. Must be named 'handle'."""
entry = {
"timestamp": datetime.now().isoformat(),
"event": event_type,
**context,
}
with open(LOG_FILE, "a") as f:
f.write(json.dumps(entry) + "\n")
```
**Handler rules:**
- Must be named `handle`
- Receives `event_type` (string) and `context` (dict)
- Can be `async def` or regular `def` — both work
- Errors are caught and logged, never crashing the agent
## Available Events
| Event | When it fires | Context keys |
|-------|---------------|--------------|
| `gateway:startup` | Gateway process starts | `platforms` (list of active platform names) |
| `session:start` | New messaging session created | `platform`, `user_id`, `session_id`, `session_key` |
| `session:reset` | User ran `/new` or `/reset` | `platform`, `user_id`, `session_key` |
| `agent:start` | Agent begins processing a message | `platform`, `user_id`, `session_id`, `message` |
| `agent:step` | Each iteration of the tool-calling loop | `platform`, `user_id`, `session_id`, `iteration`, `tool_names` |
| `agent:end` | Agent finishes processing | `platform`, `user_id`, `session_id`, `message`, `response` |
| `command:*` | Any slash command executed | `platform`, `user_id`, `command`, `args` |
### Wildcard Matching
Handlers registered for `command:*` fire for any `command:` event (`command:model`, `command:reset`, etc.). Monitor all slash commands with a single subscription.
## Examples
### Telegram Alert on Long Tasks
Send yourself a message when the agent takes more than 10 steps:
```yaml
# ~/.hermes/hooks/long-task-alert/HOOK.yaml
name: long-task-alert
description: Alert when agent is taking many steps
events:
- agent:step
```
```python
# ~/.hermes/hooks/long-task-alert/handler.py
import os
import httpx
THRESHOLD = 10
BOT_TOKEN = os.getenv("TELEGRAM_BOT_TOKEN")
CHAT_ID = os.getenv("TELEGRAM_HOME_CHANNEL")
async def handle(event_type: str, context: dict):
iteration = context.get("iteration", 0)
if iteration == THRESHOLD and BOT_TOKEN and CHAT_ID:
tools = ", ".join(context.get("tool_names", []))
text = f"⚠️ Agent has been running for {iteration} steps. Last tools: {tools}"
async with httpx.AsyncClient() as client:
await client.post(
f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage",
json={"chat_id": CHAT_ID, "text": text},
)
```
### Command Usage Logger
Track which slash commands are used:
```yaml
# ~/.hermes/hooks/command-logger/HOOK.yaml
name: command-logger
description: Log slash command usage
events:
- command:*
```
```python
# ~/.hermes/hooks/command-logger/handler.py
import json
from datetime import datetime
from pathlib import Path
LOG = Path.home() / ".hermes" / "logs" / "command_usage.jsonl"
def handle(event_type: str, context: dict):
LOG.parent.mkdir(parents=True, exist_ok=True)
entry = {
"ts": datetime.now().isoformat(),
"command": context.get("command"),
"args": context.get("args"),
"platform": context.get("platform"),
"user": context.get("user_id"),
}
with open(LOG, "a") as f:
f.write(json.dumps(entry) + "\n")
```
### Session Start Webhook
POST to an external service on new sessions:
```yaml
# ~/.hermes/hooks/session-webhook/HOOK.yaml
name: session-webhook
description: Notify external service on new sessions
events:
- session:start
- session:reset
```
```python
# ~/.hermes/hooks/session-webhook/handler.py
import httpx
WEBHOOK_URL = "https://your-service.example.com/hermes-events"
async def handle(event_type: str, context: dict):
async with httpx.AsyncClient() as client:
await client.post(WEBHOOK_URL, json={
"event": event_type,
**context,
}, timeout=5)
```
## How It Works
1. On gateway startup, `HookRegistry.discover_and_load()` scans `~/.hermes/hooks/`
2. Each subdirectory with `HOOK.yaml` + `handler.py` is loaded dynamically
3. Handlers are registered for their declared events
4. At each lifecycle point, `hooks.emit()` fires all matching handlers
5. Errors in any handler are caught and logged — a broken hook never crashes the agent
:::info
Hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not currently load hooks.
:::

View file

@ -0,0 +1,269 @@
---
sidebar_position: 4
title: "MCP (Model Context Protocol)"
description: "Connect Hermes Agent to external tool servers via MCP — databases, APIs, filesystems, and more"
---
# MCP (Model Context Protocol)
MCP lets Hermes Agent connect to external tool servers — giving the agent access to databases, APIs, filesystems, and more without any code changes.
## Overview
The [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) is an open standard for connecting AI agents to external tools and data sources. MCP servers expose tools over a lightweight RPC protocol, and Hermes Agent can connect to any compliant server automatically.
What this means for you:
- **Thousands of ready-made tools** — browse the [MCP server directory](https://github.com/modelcontextprotocol/servers) for servers covering GitHub, Slack, databases, file systems, web scraping, and more
- **No code changes needed** — add a few lines to `~/.hermes/config.yaml` and the tools appear alongside built-in ones
- **Mix and match** — run multiple MCP servers simultaneously, combining stdio-based and HTTP-based servers
- **Secure by default** — environment variables are filtered and credentials are stripped from error messages
## Prerequisites
```bash
pip install hermes-agent[mcp]
```
| Server Type | Runtime Needed | Example |
|-------------|---------------|---------|
| HTTP/remote | Nothing extra | `url: "https://mcp.example.com"` |
| npm-based (npx) | Node.js 18+ | `command: "npx"` |
| Python-based | uv (recommended) | `command: "uvx"` |
## Configuration
MCP servers are configured in `~/.hermes/config.yaml` under the `mcp_servers` key.
### Stdio Servers
Stdio servers run as local subprocesses, communicating over stdin/stdout:
```yaml
mcp_servers:
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
env: {}
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"
```
| Key | Required | Description |
|-----|----------|-------------|
| `command` | Yes | Executable to run (`npx`, `uvx`, `python`) |
| `args` | No | Command-line arguments |
| `env` | No | Environment variables for the subprocess |
:::info Security
Only explicitly listed `env` variables plus a safe baseline (`PATH`, `HOME`, `USER`, `LANG`, `SHELL`, `TMPDIR`, `XDG_*`) are passed to the subprocess. Your API keys and secrets are **not** leaked.
:::
### HTTP Servers
```yaml
mcp_servers:
remote_api:
url: "https://my-mcp-server.example.com/mcp"
headers:
Authorization: "Bearer sk-xxxxxxxxxxxx"
```
### Per-Server Timeouts
```yaml
mcp_servers:
slow_database:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-postgres"]
env:
DATABASE_URL: "postgres://user:pass@localhost/mydb"
timeout: 300 # Tool call timeout (default: 120s)
connect_timeout: 90 # Initial connection timeout (default: 60s)
```
### Mixed Configuration Example
```yaml
mcp_servers:
# Local filesystem via stdio
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
# GitHub API via stdio with auth
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"
# Remote database via HTTP
company_db:
url: "https://mcp.internal.company.com/db"
headers:
Authorization: "Bearer sk-xxxxxxxxxxxx"
timeout: 180
# Python-based server via uvx
memory:
command: "uvx"
args: ["mcp-server-memory"]
```
## Translating from Claude Desktop Config
Many MCP server docs show Claude Desktop JSON format. Here's the translation:
**Claude Desktop JSON:**
```json
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
}
```
**Hermes YAML:**
```yaml
mcp_servers: # mcpServers → mcp_servers (snake_case)
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
```
Rules: `mcpServers``mcp_servers` (snake_case), JSON → YAML. Keys like `command`, `args`, `env` are identical.
## How It Works
### Tool Registration
Each MCP tool is registered with a prefixed name:
```
mcp_{server_name}_{tool_name}
```
| Server Name | MCP Tool Name | Registered As |
|-------------|--------------|---------------|
| `filesystem` | `read_file` | `mcp_filesystem_read_file` |
| `github` | `create-issue` | `mcp_github_create_issue` |
| `my-api` | `query.data` | `mcp_my_api_query_data` |
Tools appear alongside built-in tools — the agent calls them like any other tool.
### Reconnection
If an MCP server disconnects, Hermes automatically reconnects with exponential backoff (1s, 2s, 4s, 8s, 16s — max 5 attempts). Initial connection failures are reported immediately.
### Shutdown
On agent exit, all MCP server connections are cleanly shut down.
## Popular MCP Servers
| Server | Package | Description |
|--------|---------|-------------|
| Filesystem | `@modelcontextprotocol/server-filesystem` | Read/write/search local files |
| GitHub | `@modelcontextprotocol/server-github` | Issues, PRs, repos, code search |
| Git | `@modelcontextprotocol/server-git` | Git operations on local repos |
| Fetch | `@modelcontextprotocol/server-fetch` | HTTP fetching and web content |
| Memory | `@modelcontextprotocol/server-memory` | Persistent key-value memory |
| SQLite | `@modelcontextprotocol/server-sqlite` | Query SQLite databases |
| PostgreSQL | `@modelcontextprotocol/server-postgres` | Query PostgreSQL databases |
| Brave Search | `@modelcontextprotocol/server-brave-search` | Web search via Brave API |
| Puppeteer | `@modelcontextprotocol/server-puppeteer` | Browser automation |
### Example Configs
```yaml
mcp_servers:
# No API key needed
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
git:
command: "uvx"
args: ["mcp-server-git", "--repository", "/home/user/my-repo"]
fetch:
command: "uvx"
args: ["mcp-server-fetch"]
sqlite:
command: "uvx"
args: ["mcp-server-sqlite", "--db-path", "/home/user/data.db"]
# Requires API key
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"
brave_search:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-brave-search"]
env:
BRAVE_API_KEY: "BSA_xxxxxxxxxxxx"
```
## Troubleshooting
### "MCP SDK not available"
```bash
pip install hermes-agent[mcp]
```
### Server fails to start
The MCP server command (`npx`, `uvx`) is not on PATH. Install the required runtime:
```bash
# For npm-based servers
npm install -g npx # or ensure Node.js 18+ is installed
# For Python-based servers
pip install uv # then use "uvx" as the command
```
### Server connects but tools fail with auth errors
Ensure the key is in the server's `env` block:
```yaml
mcp_servers:
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_your_actual_token" # Check this
```
### Connection timeout
Increase `connect_timeout` for slow-starting servers:
```yaml
mcp_servers:
slow_server:
command: "npx"
args: ["-y", "heavy-server-package"]
connect_timeout: 120 # default is 60
```
### Reload MCP Servers
You can reload MCP servers without restarting Hermes:
- In the CLI: the agent reconnects automatically
- In messaging: send `/reload-mcp`

View file

@ -0,0 +1,97 @@
---
sidebar_position: 3
title: "Persistent Memory"
description: "How Hermes Agent remembers across sessions — MEMORY.md, USER.md, and session search"
---
# Persistent Memory
Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it has learned.
## How It Works
Two files make up the agent's memory:
| File | Purpose | Token Budget |
|------|---------|-------------|
| **MEMORY.md** | Agent's personal notes — environment facts, conventions, things learned | ~800 tokens (~2200 chars) |
| **USER.md** | User profile — your preferences, communication style, expectations | ~500 tokens (~1375 chars) |
Both are stored in `~/.hermes/memories/` and are injected into the system prompt as a frozen snapshot at session start. The agent manages its own memory via the `memory` tool — it can add, replace, or remove entries.
:::info
Character limits keep memory focused. When memory is full, the agent consolidates or replaces entries to make room for new information.
:::
## Configuration
```yaml
# In ~/.hermes/config.yaml
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
```
## Memory Tool Actions
The agent uses the `memory` tool with these actions:
- **add** — Add a new memory entry
- **replace** — Replace an existing entry with updated content
- **remove** — Remove an entry that's no longer relevant
- **read** — Read current memory contents
## Session Search
Beyond MEMORY.md and USER.md, the agent can search its past conversations using the `session_search` tool:
- All CLI and messaging sessions are stored in SQLite (`~/.hermes/state.db`) with FTS5 full-text search
- Search queries return relevant past conversations with Gemini Flash summarization
- The agent can find things it discussed weeks ago, even if they're not in its active memory
```bash
hermes sessions list # Browse past sessions
```
## Honcho Integration (Cross-Session User Modeling)
For deeper, AI-generated user understanding that works across tools, you can optionally enable [Honcho](https://honcho.dev/) by Plastic Labs. Honcho runs alongside existing memory — USER.md stays as-is, and Honcho adds an additional layer of context.
When enabled:
- **Prefetch**: Each turn, Honcho's user representation is injected into the system prompt
- **Sync**: After each conversation, messages are synced to Honcho
- **Query tool**: The agent can actively query its understanding of you via `query_user_context`
**Setup:**
```bash
# 1. Install the optional dependency
uv pip install honcho-ai
# 2. Get an API key from https://app.honcho.dev
# 3. Create ~/.honcho/config.json
cat > ~/.honcho/config.json << 'EOF'
{
"enabled": true,
"apiKey": "your-honcho-api-key",
"peerName": "your-name",
"hosts": {
"hermes": {
"workspace": "hermes"
}
}
}
EOF
```
Or via environment variable:
```bash
hermes config set HONCHO_API_KEY your-key
```
:::tip
Honcho is fully opt-in — zero behavior change when disabled or unconfigured. All Honcho calls are non-fatal; if the service is unreachable, the agent continues normally.
:::

View file

@ -0,0 +1,159 @@
---
sidebar_position: 2
title: "Skills System"
description: "On-demand knowledge documents — progressive disclosure, agent-managed skills, and the Skills Hub"
---
# Skills System
Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure** pattern to minimize token usage and are compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
All skills live in **`~/.hermes/skills/`** — a single directory that serves as the source of truth. On fresh install, bundled skills are copied from the repo. Hub-installed and agent-created skills also go here. The agent can modify or delete any skill.
## Using Skills
Every installed skill is automatically available as a slash command:
```bash
# In the CLI or any messaging platform:
/gif-search funny cats
/axolotl help me fine-tune Llama 3 on my dataset
/github-pr-workflow create a PR for the auth refactor
# Just the skill name loads it and lets the agent ask what you need:
/excalidraw
```
You can also interact with skills through natural conversation:
```bash
hermes --toolsets skills -q "What skills do you have?"
hermes --toolsets skills -q "Show me the axolotl skill"
```
## Progressive Disclosure
Skills use a token-efficient loading pattern:
```
Level 0: skills_categories() → ["mlops", "devops"] (~50 tokens)
Level 1: skills_list(category) → [{name, description}, ...] (~3k tokens)
Level 2: skill_view(name) → Full content + metadata (varies)
Level 3: skill_view(name, path) → Specific reference file (varies)
```
The agent only loads the full skill content when it actually needs it.
## SKILL.md Format
```markdown
---
name: my-skill
description: Brief description of what this skill does
version: 1.0.0
metadata:
hermes:
tags: [python, automation]
category: devops
---
# Skill Title
## When to Use
Trigger conditions for this skill.
## Procedure
1. Step one
2. Step two
## Pitfalls
- Known failure modes and fixes
## Verification
How to confirm it worked.
```
## Skill Directory Structure
```
~/.hermes/skills/ # Single source of truth
├── mlops/ # Category directory
│ ├── axolotl/
│ │ ├── SKILL.md # Main instructions (required)
│ │ ├── references/ # Additional docs
│ │ ├── templates/ # Output formats
│ │ └── assets/ # Supplementary files
│ └── vllm/
│ └── SKILL.md
├── devops/
│ └── deploy-k8s/ # Agent-created skill
│ ├── SKILL.md
│ └── references/
├── .hub/ # Skills Hub state
│ ├── lock.json
│ ├── quarantine/
│ └── audit.log
└── .bundled_manifest # Tracks seeded bundled skills
```
## Agent-Managed Skills (skill_manage tool)
The agent can create, update, and delete its own skills via the `skill_manage` tool. This is the agent's **procedural memory** — when it figures out a non-trivial workflow, it saves the approach as a skill for future reuse.
### When the Agent Creates Skills
- After completing a complex task (5+ tool calls) successfully
- When it hit errors or dead ends and found the working path
- When the user corrected its approach
- When it discovered a non-trivial workflow
### Actions
| Action | Use for | Key params |
|--------|---------|------------|
| `create` | New skill from scratch | `name`, `content` (full SKILL.md), optional `category` |
| `patch` | Targeted fixes (preferred) | `name`, `old_string`, `new_string` |
| `edit` | Major structural rewrites | `name`, `content` (full SKILL.md replacement) |
| `delete` | Remove a skill entirely | `name` |
| `write_file` | Add/update supporting files | `name`, `file_path`, `file_content` |
| `remove_file` | Remove a supporting file | `name`, `file_path` |
:::tip
The `patch` action is preferred for updates — it's more token-efficient than `edit` because only the changed text appears in the tool call.
:::
## Skills Hub
Search, install, and manage skills from online registries:
```bash
hermes skills search kubernetes # Search all sources
hermes skills install openai/skills/k8s # Install with security scan
hermes skills inspect openai/skills/k8s # Preview before installing
hermes skills list --source hub # List hub-installed skills
hermes skills audit # Re-scan all hub skills
hermes skills uninstall k8s # Remove a hub skill
hermes skills publish skills/my-skill --to github --repo owner/repo
hermes skills snapshot export setup.json # Export skill config
hermes skills tap add myorg/skills-repo # Add a custom source
```
All hub-installed skills go through a **security scanner** that checks for data exfiltration, prompt injection, destructive commands, and other threats.
### Trust Levels
| Level | Source | Policy |
|-------|--------|--------|
| `builtin` | Ships with Hermes | Always trusted |
| `trusted` | openai/skills, anthropics/skills | Trusted sources |
| `community` | Everything else | Any findings = blocked unless `--force` |
### Slash Commands (Inside Chat)
All the same commands work with `/skills` prefix:
```
/skills search kubernetes
/skills install openai/skills/skill-creator
/skills list
```

View file

@ -0,0 +1,163 @@
---
sidebar_position: 1
title: "Tools & Toolsets"
description: "Overview of Hermes Agent's tools — what's available, how toolsets work, and terminal backends"
---
# Tools & Toolsets
Tools are functions that extend the agent's capabilities. They're organized into logical **toolsets** that can be enabled or disabled per platform.
## Available Tools
| Category | Tools | Description |
|----------|-------|-------------|
| **Web** | `web_search`, `web_extract`, `web_crawl` | Search the web, extract page content, crawl sites |
| **Terminal** | `terminal` | Execute commands (local/docker/singularity/modal/ssh backends) |
| **File** | `read_file`, `write_file`, `patch`, `search` | Read, write, edit, and search files |
| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, etc. | Full browser automation via Browserbase |
| **Vision** | `vision_analyze` | Image analysis via multimodal models |
| **Image Gen** | `image_generate` | Generate images (FLUX via FAL) |
| **TTS** | `text_to_speech` | Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |
| **Reasoning** | `mixture_of_agents` | Multi-model reasoning |
| **Skills** | `skills_list`, `skill_view`, `skill_manage` | Find, view, create, and manage skills |
| **Todo** | `todo` | Read/write task list for multi-step planning |
| **Memory** | `memory` | Persistent notes + user profile across sessions |
| **Session Search** | `session_search` | Search + summarize past conversations (FTS5) |
| **Cronjob** | `schedule_cronjob`, `list_cronjobs`, `remove_cronjob` | Scheduled task management |
| **Code Execution** | `execute_code` | Run Python scripts that call tools via RPC sandbox |
| **Delegation** | `delegate_task` | Spawn subagents with isolated context |
| **Clarify** | `clarify` | Ask the user multiple-choice or open-ended questions (CLI-only) |
| **MCP** | Auto-discovered | External tools from MCP servers |
## Using Toolsets
```bash
# Use specific toolsets
hermes --toolsets "web,terminal"
# See all available tools
hermes tools
# Configure tools per platform (interactive)
hermes tools
```
**Available toolsets:** `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, `code_execution`, `delegation`, `clarify`, and more.
## Terminal Backends
The terminal tool can execute commands in different environments:
| Backend | Description | Use Case |
|---------|-------------|----------|
| `local` | Run on your machine (default) | Development, trusted tasks |
| `docker` | Isolated containers | Security, reproducibility |
| `ssh` | Remote server | Sandboxing, keep agent away from its own code |
| `singularity` | HPC containers | Cluster computing, rootless |
| `modal` | Cloud execution | Serverless, scale |
### Configuration
```yaml
# In ~/.hermes/config.yaml
terminal:
backend: local # or: docker, ssh, singularity, modal
cwd: "." # Working directory
timeout: 180 # Command timeout in seconds
```
### Docker Backend
```yaml
terminal:
backend: docker
docker_image: python:3.11-slim
```
### SSH Backend
Recommended for security — agent can't modify its own code:
```yaml
terminal:
backend: ssh
```
```bash
# Set credentials in ~/.hermes/.env
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa
```
### Singularity/Apptainer
```bash
# Pre-build SIF for parallel workers
apptainer build ~/python.sif docker://python:3.11-slim
# Configure
hermes config set terminal.backend singularity
hermes config set terminal.singularity_image ~/python.sif
```
### Modal (Serverless Cloud)
```bash
uv pip install "swe-rex[modal]"
modal setup
hermes config set terminal.backend modal
```
### Container Resources
Configure CPU, memory, disk, and persistence for all container backends:
```yaml
terminal:
backend: docker # or singularity, modal
container_cpu: 1 # CPU cores (default: 1)
container_memory: 5120 # Memory in MB (default: 5GB)
container_disk: 51200 # Disk in MB (default: 50GB)
container_persistent: true # Persist filesystem across sessions (default: true)
```
When `container_persistent: true`, installed packages, files, and config survive across sessions.
### Container Security
All container backends run with security hardening:
- Read-only root filesystem (Docker)
- All Linux capabilities dropped
- No privilege escalation
- PID limits (256 processes)
- Full namespace isolation
- Persistent workspace via volumes, not writable root layer
## Background Process Management
Start background processes and manage them:
```python
terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345}
# Then manage with the process tool:
process(action="list") # Show all running processes
process(action="poll", session_id="proc_abc123") # Check status
process(action="wait", session_id="proc_abc123") # Block until done
process(action="log", session_id="proc_abc123") # Full output
process(action="kill", session_id="proc_abc123") # Terminate
process(action="write", session_id="proc_abc123", data="y") # Send input
```
PTY mode (`pty=true`) enables interactive CLI tools like Codex and Claude Code.
## Sudo Support
If a command needs sudo, you'll be prompted for your password (cached for the session). Or set `SUDO_PASSWORD` in `~/.hermes/.env`.
:::warning
On messaging platforms, if sudo fails, the output includes a tip to add `SUDO_PASSWORD` to `~/.hermes/.env`.
:::

View file

@ -0,0 +1,89 @@
---
sidebar_position: 9
title: "Voice & TTS"
description: "Text-to-speech and voice message transcription across all platforms"
---
# Voice & TTS
Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.
## Text-to-Speech
Convert text to speech with three providers:
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Edge TTS** (default) | Good | Free | None needed |
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
### Platform Delivery
| Platform | Delivery | Format |
|----------|----------|--------|
| Telegram | Voice bubble (plays inline) | Opus `.ogg` |
| Discord | Audio file attachment | MP3 |
| WhatsApp | Audio file attachment | MP3 |
| CLI | Saved to `~/voice-memos/` | MP3 |
### Configuration
```yaml
# In ~/.hermes/config.yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
```
### Telegram Voice Bubbles & ffmpeg
Telegram voice bubbles require Opus/OGG audio format:
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
```bash
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Fedora
sudo dnf install ffmpeg
```
Without ffmpeg, Edge TTS audio is sent as a regular audio file (playable, but shows as a rectangular player instead of a voice bubble).
:::tip
If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.
:::
## Voice Message Transcription
Voice messages sent on Telegram, Discord, WhatsApp, or Slack are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.
| Provider | Model | Quality | Cost |
|----------|-------|---------|------|
| **OpenAI Whisper** | `whisper-1` (default) | Good | Low |
| **OpenAI GPT-4o** | `gpt-4o-mini-transcribe` | Better | Medium |
| **OpenAI GPT-4o** | `gpt-4o-transcribe` | Best | Higher |
Requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.
### Configuration
```yaml
# In ~/.hermes/config.yaml
stt:
enabled: true
model: "whisper-1"
```