diff --git a/website/docs/developer-guide/agent-loop.md b/website/docs/developer-guide/agent-loop.md index 5d34c9123..39a96df64 100644 --- a/website/docs/developer-guide/agent-loop.md +++ b/website/docs/developer-guide/agent-loop.md @@ -6,107 +6,231 @@ description: "Detailed walkthrough of AIAgent execution, API modes, tools, callb # Agent Loop Internals -The core orchestration engine is `run_agent.py`'s `AIAgent`. +The core orchestration engine is `run_agent.py`'s `AIAgent` class — roughly 9,200 lines that handle everything from prompt assembly to tool dispatch to provider failover. -## Core responsibilities +## Core Responsibilities `AIAgent` is responsible for: -- assembling the effective prompt and tool schemas -- selecting the correct provider/API mode -- making interruptible model calls -- executing tool calls (sequentially or concurrently) -- maintaining session history -- handling compression, retries, and fallback models +- Assembling the effective system prompt and tool schemas via `prompt_builder.py` +- Selecting the correct provider/API mode (chat_completions, codex_responses, anthropic_messages) +- Making interruptible model calls with cancellation support +- Executing tool calls (sequentially or concurrently via thread pool) +- Maintaining conversation history in OpenAI message format +- Handling compression, retries, and fallback model switching +- Tracking iteration budgets across parent and child agents +- Flushing persistent memory before context is lost -## API modes +## Two Entry Points -Hermes currently supports three API execution modes: +```python +# Simple interface — returns final response string +response = agent.chat("Fix the bug in main.py") -| API mode | Used for | -|----------|----------| -| `chat_completions` | OpenAI-compatible chat endpoints, including OpenRouter and most custom endpoints | -| `codex_responses` | OpenAI Codex / Responses API path | -| `anthropic_messages` | Native Anthropic Messages API | +# Full interface — returns dict with messages, metadata, usage stats +result = agent.run_conversation( + user_message="Fix the bug in main.py", + system_message=None, # auto-built if omitted + conversation_history=None, # auto-loaded from session if omitted + task_id="task_abc123" +) +``` -The mode is resolved from explicit args, provider selection, and base URL heuristics. +`chat()` is a thin wrapper around `run_conversation()` that extracts the `final_response` field from the result dict. -## Turn lifecycle +## API Modes + +Hermes supports three API execution modes, resolved from provider selection, explicit args, and base URL heuristics: + +| API mode | Used for | Client type | +|----------|----------|-------------| +| `chat_completions` | OpenAI-compatible endpoints (OpenRouter, custom, most providers) | `openai.OpenAI` | +| `codex_responses` | OpenAI Codex / Responses API | `openai.OpenAI` with Responses format | +| `anthropic_messages` | Native Anthropic Messages API | `anthropic.Anthropic` via adapter | + +The mode determines how messages are formatted, how tool calls are structured, how responses are parsed, and how caching/streaming works. All three converge on the same internal message format (OpenAI-style `role`/`content`/`tool_calls` dicts) before and after API calls. + +**Mode resolution order:** +1. Explicit `api_mode` constructor arg (highest priority) +2. Provider-specific detection (e.g., `anthropic` provider → `anthropic_messages`) +3. Base URL heuristics (e.g., `api.anthropic.com` → `anthropic_messages`) +4. Default: `chat_completions` + +## Turn Lifecycle + +Each iteration of the agent loop follows this sequence: ```text run_conversation() - -> generate effective task_id - -> append current user message - -> load or build cached system prompt - -> maybe preflight-compress - -> build api_messages - -> inject ephemeral prompt layers - -> apply prompt caching if appropriate - -> make interruptible API call - -> if tool calls: execute them, append tool results, loop - -> if final text: persist, cleanup, return response + 1. Generate task_id if not provided + 2. Append user message to conversation history + 3. Build or reuse cached system prompt (prompt_builder.py) + 4. Check if preflight compression is needed (>50% context) + 5. Build API messages from conversation history + - chat_completions: OpenAI format as-is + - codex_responses: convert to Responses API input items + - anthropic_messages: convert via anthropic_adapter.py + 6. Inject ephemeral prompt layers (budget warnings, context pressure) + 7. Apply prompt caching markers if on Anthropic + 8. Make interruptible API call (_api_call_with_interrupt) + 9. Parse response: + - If tool_calls: execute them, append results, loop back to step 5 + - If text response: persist session, flush memory if needed, return ``` -## Interruptible API calls +### Message Format -Hermes wraps API requests so they can be interrupted from the CLI or gateway. +All messages use OpenAI-compatible format internally: -This matters because: +```python +{"role": "system", "content": "..."} +{"role": "user", "content": "..."} +{"role": "assistant", "content": "...", "tool_calls": [...]} +{"role": "tool", "tool_call_id": "...", "content": "..."} +``` -- the agent may be in a long LLM call -- the user may send a new message mid-flight -- background systems may need cancellation semantics +Reasoning content (from models that support extended thinking) is stored in `assistant_msg["reasoning"]` and optionally displayed via the `reasoning_callback`. -## Tool execution modes +### Message Alternation Rules -Hermes uses two execution strategies: +The agent loop enforces strict message role alternation: -- sequential execution for single or interactive tools -- concurrent execution for multiple non-interactive tools +- After the system message: `User → Assistant → User → Assistant → ...` +- During tool calling: `Assistant (with tool_calls) → Tool → Tool → ... → Assistant` +- **Never** two assistant messages in a row +- **Never** two user messages in a row +- **Only** `tool` role can have consecutive entries (parallel tool results) -Concurrent tool execution preserves message/result ordering when reinserting tool responses into conversation history. +Providers validate these sequences and will reject malformed histories. -## Callback surfaces +## Interruptible API Calls -`AIAgent` supports platform/integration callbacks such as: +API requests are wrapped in `_api_call_with_interrupt()` which runs the actual HTTP call in a background thread while monitoring an interrupt event: -- `tool_progress_callback` -- `thinking_callback` -- `reasoning_callback` -- `clarify_callback` -- `step_callback` -- `stream_delta_callback` -- `tool_gen_callback` -- `status_callback` +```text +┌──────────────────────┐ ┌──────────────┐ +│ Main thread │ │ API thread │ +│ wait on: │────▶│ HTTP POST │ +│ - response ready │ │ to provider │ +│ - interrupt event │ └──────────────┘ +│ - timeout │ +└──────────────────────┘ +``` -These are how the CLI, gateway, and ACP integrations stream intermediate progress and interactive approval/clarification flows. +When interrupted (user sends new message, `/stop` command, or signal): +- The API thread is abandoned (response discarded) +- The agent can process the new input or shut down cleanly +- No partial response is injected into conversation history -## Budget and fallback behavior +## Tool Execution -Hermes tracks a shared iteration budget across parent and subagents. It also injects budget pressure hints near the end of the available iteration window. +### Sequential vs Concurrent -Fallback model support allows the agent to switch providers/models when the primary route fails in supported failure paths. +When the model returns tool calls: -## Compression and persistence +- **Single tool call** → executed directly in the main thread +- **Multiple tool calls** → executed concurrently via `ThreadPoolExecutor` + - Exception: tools marked as interactive (e.g., `clarify`) force sequential execution + - Results are reinserted in the original tool call order regardless of completion order -Before and during long runs, Hermes may: +### Execution Flow -- flush memory before context loss -- compress middle conversation turns -- split the session lineage into a new session ID after compression -- preserve recent context and structural tool-call/result consistency +```text +for each tool_call in response.tool_calls: + 1. Resolve handler from tools/registry.py + 2. Fire pre_tool_call plugin hook + 3. Check if dangerous command (tools/approval.py) + - If dangerous: invoke approval_callback, wait for user + 4. Execute handler with args + task_id + 5. Fire post_tool_call plugin hook + 6. Append {"role": "tool", "content": result} to history +``` -## Key files to read next +### Agent-Level Tools -- `run_agent.py` -- `agent/prompt_builder.py` -- `agent/context_compressor.py` -- `agent/prompt_caching.py` -- `model_tools.py` +Some tools are intercepted by `run_agent.py` *before* reaching `handle_function_call()`: -## Related docs +| Tool | Why intercepted | +|------|-----------------| +| `todo` | Reads/writes agent-local task state | +| `memory` | Writes to persistent memory files with character limits | + +These tools modify agent state directly and return synthetic tool results without going through the registry. + +## Callback Surfaces + +`AIAgent` supports platform-specific callbacks that enable real-time progress in the CLI, gateway, and ACP integrations: + +| Callback | When fired | Used by | +|----------|-----------|---------| +| `tool_progress_callback` | Before/after each tool execution | CLI spinner, gateway progress messages | +| `thinking_callback` | When model starts/stops thinking | CLI "thinking..." indicator | +| `reasoning_callback` | When model returns reasoning content | CLI reasoning display, gateway reasoning blocks | +| `clarify_callback` | When `clarify` tool is called | CLI input prompt, gateway interactive message | +| `step_callback` | After each complete agent turn | Gateway step tracking, ACP progress | +| `stream_delta_callback` | Each streaming token (when enabled) | CLI streaming display | +| `tool_gen_callback` | When tool call is parsed from stream | CLI tool preview in spinner | +| `status_callback` | State changes (thinking, executing, etc.) | ACP status updates | + +## Budget and Fallback Behavior + +### Iteration Budget + +The agent tracks iterations via `IterationBudget`: + +- Default: 90 iterations (configurable via `agent.max_turns`) +- Shared across parent and child agents — a subagent consumes from the parent's budget +- At 70%+ usage, `_get_budget_warning()` appends a `[BUDGET WARNING: ...]` to the last tool result +- At 100%, the agent stops and returns a summary of work done + +### Fallback Model + +When the primary model fails (429 rate limit, 5xx server error, 401/403 auth error): + +1. Check `fallback_providers` list in config +2. Try each fallback in order +3. On success, continue the conversation with the new provider +4. On 401/403, attempt credential refresh before failing over + +The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the `auxiliary.*` config section. + +## Compression and Persistence + +### When Compression Triggers + +- **Preflight** (before API call): If conversation exceeds 50% of model's context window +- **Gateway auto-compression**: If conversation exceeds 85% (more aggressive, runs between turns) + +### What Happens During Compression + +1. Memory is flushed to disk first (preventing data loss) +2. Middle conversation turns are summarized into a compact summary +3. The last N messages are preserved intact (`compression.protect_last_n`, default: 20) +4. Tool call/result message pairs are kept together (never split) +5. A new session lineage ID is generated (compression creates a "child" session) + +### Session Persistence + +After each turn: +- Messages are saved to the session store (SQLite via `hermes_state.py`) +- Memory changes are flushed to `MEMORY.md` / `USER.md` +- The session can be resumed later via `/resume` or `hermes chat --resume` + +## Key Source Files + +| File | Purpose | +|------|---------| +| `run_agent.py` | AIAgent class — the complete agent loop (~9,200 lines) | +| `agent/prompt_builder.py` | System prompt assembly from memory, skills, context files, personality | +| `agent/context_compressor.py` | Conversation compression algorithm | +| `agent/prompt_caching.py` | Anthropic prompt caching markers and cache metrics | +| `agent/auxiliary_client.py` | Auxiliary LLM client for side tasks (vision, summarization) | +| `model_tools.py` | Tool schema collection, `handle_function_call()` dispatch | + +## Related Docs - [Provider Runtime Resolution](./provider-runtime.md) - [Prompt Assembly](./prompt-assembly.md) - [Context Compression & Prompt Caching](./context-compression-and-caching.md) - [Tools Runtime](./tools-runtime.md) +- [Architecture Overview](./architecture.md) diff --git a/website/docs/developer-guide/architecture.md b/website/docs/developer-guide/architecture.md index 2b6e13d3e..ab143dc2a 100644 --- a/website/docs/developer-guide/architecture.md +++ b/website/docs/developer-guide/architecture.md @@ -1,152 +1,274 @@ --- sidebar_position: 1 title: "Architecture" -description: "Hermes Agent internals — major subsystems, execution paths, and where to read next" +description: "Hermes Agent internals — major subsystems, execution paths, data flow, and where to read next" --- # Architecture -This page is the top-level map of Hermes Agent internals. The project has grown beyond a single monolithic loop, so the best way to understand it is by subsystem. +This page is the top-level map of Hermes Agent internals. Use it to orient yourself in the codebase, then dive into subsystem-specific docs for implementation details. -## High-level structure +## System Overview + +```text +┌─────────────────────────────────────────────────────────────────────┐ +│ Entry Points │ +│ │ +│ CLI (cli.py) Gateway (gateway/run.py) ACP (acp_adapter/) │ +│ Batch Runner API Server Python Library │ +└──────────┬──────────────┬───────────────────────┬────────────────────┘ + │ │ │ + ▼ ▼ ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ AIAgent (run_agent.py) │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Prompt │ │ Provider │ │ Tool │ │ +│ │ Builder │ │ Resolution │ │ Dispatch │ │ +│ │ (prompt_ │ │ (runtime_ │ │ (model_ │ │ +│ │ builder.py) │ │ provider.py)│ │ tools.py) │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ │ +│ ┌──────┴───────┐ ┌──────┴───────┐ ┌──────┴───────┐ │ +│ │ Compression │ │ 3 API Modes │ │ Tool Registry│ │ +│ │ & Caching │ │ chat_compl. │ │ (registry.py)│ │ +│ │ │ │ codex_resp. │ │ 47 tools │ │ +│ │ │ │ anthropic │ │ 37 toolsets │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +└─────────────────────────────────────────────────────────────────────┘ + │ │ + ▼ ▼ +┌───────────────────┐ ┌──────────────────────┐ +│ Session Storage │ │ Tool Backends │ +│ (SQLite + FTS5) │ │ Terminal (6 backends) │ +│ hermes_state.py │ │ Browser (5 backends) │ +│ gateway/session.py│ │ Web (4 backends) │ +└───────────────────┘ │ MCP (dynamic) │ + │ File, Vision, etc. │ + └──────────────────────┘ +``` + +## Directory Structure ```text hermes-agent/ -├── run_agent.py # AIAgent core loop -├── cli.py # interactive terminal UI -├── model_tools.py # tool discovery/orchestration -├── toolsets.py # tool groupings and presets -├── hermes_state.py # SQLite session/state database -├── batch_runner.py # batch trajectory generation +├── run_agent.py # AIAgent — core conversation loop (~9,200 lines) +├── cli.py # HermesCLI — interactive terminal UI (~8,500 lines) +├── model_tools.py # Tool discovery, schema collection, dispatch +├── toolsets.py # Tool groupings and platform presets +├── hermes_state.py # SQLite session/state database with FTS5 +├── hermes_constants.py # HERMES_HOME, profile-aware paths +├── batch_runner.py # Batch trajectory generation │ -├── agent/ # prompt building, compression, caching, metadata, trajectories -├── hermes_cli/ # command entrypoints, auth, setup, models, config, doctor -├── tools/ # tool implementations and terminal environments -├── gateway/ # messaging gateway, session routing, delivery, pairing, hooks -├── cron/ # scheduled job storage and scheduler -├── plugins/memory/ # Memory provider plugins (honcho, openviking, mem0, etc.) -├── acp_adapter/ # ACP editor integration server -├── acp_registry/ # ACP registry manifest + icon -├── environments/ # Hermes RL / benchmark environment framework -├── skills/ # bundled skills -├── optional-skills/ # official optional skills -└── tests/ # test suite +├── agent/ # Agent internals +│ ├── prompt_builder.py # System prompt assembly +│ ├── context_compressor.py # Conversation compression algorithm +│ ├── prompt_caching.py # Anthropic prompt caching +│ ├── auxiliary_client.py # Auxiliary LLM for side tasks (vision, summarization) +│ ├── model_metadata.py # Model context lengths, token estimation +│ ├── models_dev.py # models.dev registry integration +│ ├── anthropic_adapter.py # Anthropic Messages API format conversion +│ ├── display.py # KawaiiSpinner, tool preview formatting +│ ├── skill_commands.py # Skill slash commands +│ ├── memory_store.py # Persistent memory read/write +│ └── trajectory.py # Trajectory saving helpers +│ +├── hermes_cli/ # CLI subcommands and setup +│ ├── main.py # Entry point — all `hermes` subcommands (~4,200 lines) +│ ├── config.py # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration +│ ├── commands.py # COMMAND_REGISTRY — central slash command definitions +│ ├── auth.py # PROVIDER_REGISTRY, credential resolution +│ ├── runtime_provider.py # Provider → api_mode + credentials +│ ├── models.py # Model catalog, provider model lists +│ ├── model_switch.py # /model command logic (CLI + gateway shared) +│ ├── setup.py # Interactive setup wizard (~3,500 lines) +│ ├── skin_engine.py # CLI theming engine +│ ├── skills_config.py # hermes skills — enable/disable per platform +│ ├── skills_hub.py # /skills slash command +│ ├── tools_config.py # hermes tools — enable/disable per platform +│ ├── plugins.py # PluginManager — discovery, loading, hooks +│ ├── callbacks.py # Terminal callbacks (clarify, sudo, approval) +│ └── gateway.py # hermes gateway start/stop +│ +├── tools/ # Tool implementations (one file per tool) +│ ├── registry.py # Central tool registry +│ ├── approval.py # Dangerous command detection +│ ├── terminal_tool.py # Terminal orchestration +│ ├── process_registry.py # Background process management +│ ├── file_tools.py # read_file, write_file, patch, search_files +│ ├── web_tools.py # web_search, web_extract +│ ├── browser_tool.py # 11 browser automation tools +│ ├── code_execution_tool.py # execute_code sandbox +│ ├── delegate_tool.py # Subagent delegation +│ ├── mcp_tool.py # MCP client (~1,050 lines) +│ ├── credential_files.py # File-based credential passthrough +│ ├── env_passthrough.py # Env var passthrough for sandboxes +│ ├── ansi_strip.py # ANSI escape stripping +│ └── environments/ # Terminal backends (local, docker, ssh, modal, daytona, singularity) +│ +├── gateway/ # Messaging platform gateway +│ ├── run.py # GatewayRunner — message dispatch (~5,800 lines) +│ ├── session.py # SessionStore — conversation persistence +│ ├── delivery.py # Outbound message delivery +│ ├── pairing.py # DM pairing authorization +│ ├── hooks.py # Hook discovery and lifecycle events +│ ├── mirror.py # Cross-session message mirroring +│ ├── status.py # Token locks, profile-scoped process tracking +│ ├── builtin_hooks/ # Always-registered hooks +│ └── platforms/ # 14 adapters: telegram, discord, slack, whatsapp, +│ # signal, matrix, mattermost, email, sms, +│ # dingtalk, feishu, wecom, homeassistant, webhook +│ +├── acp_adapter/ # ACP server (VS Code / Zed / JetBrains) +├── cron/ # Scheduler (jobs.py, scheduler.py) +├── plugins/memory/ # Memory provider plugins +├── environments/ # RL training environments (Atropos) +├── skills/ # Bundled skills (always available) +├── optional-skills/ # Official optional skills (install explicitly) +├── website/ # Docusaurus documentation site +└── tests/ # Pytest suite (~3,000+ tests) ``` -## Recommended reading order +## Data Flow -If you are new to the codebase, read in this order: +### CLI Session -1. this page -2. [Agent Loop Internals](./agent-loop.md) -3. [Prompt Assembly](./prompt-assembly.md) -4. [Provider Runtime Resolution](./provider-runtime.md) -5. [Adding Providers](./adding-providers.md) -6. [Tools Runtime](./tools-runtime.md) -7. [Session Storage](./session-storage.md) -8. [Gateway Internals](./gateway-internals.md) -9. [Context Compression & Prompt Caching](./context-compression-and-caching.md) -10. [ACP Internals](./acp-internals.md) -11. [Environments, Benchmarks & Data Generation](./environments.md) +```text +User input → HermesCLI.process_input() + → AIAgent.run_conversation() + → prompt_builder.build_system_prompt() + → runtime_provider.resolve_runtime_provider() + → API call (chat_completions / codex_responses / anthropic_messages) + → tool_calls? → model_tools.handle_function_call() → loop + → final response → display → save to SessionDB +``` -## Major subsystems +### Gateway Message -### Agent loop +```text +Platform event → Adapter.on_message() → MessageEvent + → GatewayRunner._handle_message() + → authorize user + → resolve session key + → create AIAgent with session history + → AIAgent.run_conversation() + → deliver response back through adapter +``` -The core synchronous orchestration engine is `AIAgent` in `run_agent.py`. +### Cron Job -It is responsible for: +```text +Scheduler tick → load due jobs from jobs.json + → create fresh AIAgent (no history) + → inject attached skills as context + → run job prompt + → deliver response to target platform + → update job state and next_run +``` -- provider/API-mode selection -- prompt construction -- tool execution -- retries and fallback -- callbacks -- compression and persistence +## Recommended Reading Order -See [Agent Loop Internals](./agent-loop.md). +If you are new to the codebase: -### Prompt system +1. **This page** — orient yourself +2. **[Agent Loop Internals](./agent-loop.md)** — how AIAgent works +3. **[Prompt Assembly](./prompt-assembly.md)** — system prompt construction +4. **[Provider Runtime Resolution](./provider-runtime.md)** — how providers are selected +5. **[Adding Providers](./adding-providers.md)** — practical guide to adding a new provider +6. **[Tools Runtime](./tools-runtime.md)** — tool registry, dispatch, environments +7. **[Session Storage](./session-storage.md)** — SQLite schema, FTS5, session lineage +8. **[Gateway Internals](./gateway-internals.md)** — messaging platform gateway +9. **[Context Compression & Prompt Caching](./context-compression-and-caching.md)** — compression and caching +10. **[ACP Internals](./acp-internals.md)** — IDE integration +11. **[Environments, Benchmarks & Data Generation](./environments.md)** — RL training -Prompt-building logic is split between: +## Major Subsystems -- `run_agent.py` -- `agent/prompt_builder.py` -- `agent/prompt_caching.py` -- `agent/context_compressor.py` +### Agent Loop -See: +The synchronous orchestration engine (`AIAgent` in `run_agent.py`). Handles provider selection, prompt construction, tool execution, retries, fallback, callbacks, compression, and persistence. Supports three API modes for different provider backends. -- [Prompt Assembly](./prompt-assembly.md) -- [Context Compression & Prompt Caching](./context-compression-and-caching.md) +→ [Agent Loop Internals](./agent-loop.md) -### Provider/runtime resolution +### Prompt System -Hermes has a shared runtime provider resolver used by CLI, gateway, cron, ACP, and auxiliary calls. +Prompt construction and maintenance across the conversation lifecycle: -See [Provider Runtime Resolution](./provider-runtime.md). +- **`prompt_builder.py`** — Assembles the system prompt from: personality (SOUL.md), memory (MEMORY.md, USER.md), skills, context files (AGENTS.md, .hermes.md), tool-use guidance, and model-specific instructions +- **`prompt_caching.py`** — Applies Anthropic cache breakpoints for prefix caching +- **`context_compressor.py`** — Summarizes middle conversation turns when context exceeds thresholds -### Tooling runtime +→ [Prompt Assembly](./prompt-assembly.md), [Context Compression & Prompt Caching](./context-compression-and-caching.md) -The tool registry, toolsets, terminal backends, process manager, and dispatch rules form a subsystem of their own. +### Provider Resolution -See [Tools Runtime](./tools-runtime.md). +A shared runtime resolver used by CLI, gateway, cron, ACP, and auxiliary calls. Maps `(provider, model)` tuples to `(api_mode, api_key, base_url)`. Handles 18+ providers, OAuth flows, credential pools, and alias resolution. -### Session persistence +→ [Provider Runtime Resolution](./provider-runtime.md) -Historical session state is stored primarily in SQLite, with lineage preserved across compression splits. +### Tool System -See [Session Storage](./session-storage.md). +Central tool registry (`tools/registry.py`) with 47 registered tools across 20 toolsets. Each tool file self-registers at import time. The registry handles schema collection, dispatch, availability checking, and error wrapping. Terminal tools support 6 backends (local, Docker, SSH, Daytona, Modal, Singularity). -### Messaging gateway +→ [Tools Runtime](./tools-runtime.md) -The gateway is a long-running orchestration layer for platform adapters, session routing, pairing, delivery, and cron ticking. +### Session Persistence -See [Gateway Internals](./gateway-internals.md). +SQLite-based session storage with FTS5 full-text search. Sessions have lineage tracking (parent/child across compressions), per-platform isolation, and atomic writes with contention handling. -### ACP integration +→ [Session Storage](./session-storage.md) -ACP exposes Hermes as an editor-native agent over stdio/JSON-RPC. +### Messaging Gateway -See: +Long-running process with 14 platform adapters, unified session routing, user authorization (allowlists + DM pairing), slash command dispatch, hook system, cron ticking, and background maintenance. -- [ACP Editor Integration](../user-guide/features/acp.md) -- [ACP Internals](./acp-internals.md) +→ [Gateway Internals](./gateway-internals.md) + +### Plugin System + +Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Memory providers are a specialized plugin type under `plugins/memory/`. + +→ [Plugin Guide](/docs/guides/build-a-hermes-plugin), [Memory Provider Plugin](./memory-provider-plugin.md) ### Cron -Cron jobs are implemented as first-class agent tasks, not just shell tasks. +First-class agent tasks (not shell tasks). Jobs store in JSON, support multiple schedule formats, can attach skills and scripts, and deliver to any platform. -See [Cron Internals](./cron-internals.md). +→ [Cron Internals](./cron-internals.md) -### RL / environments / trajectories +### ACP Integration -Hermes ships a full environment framework for evaluation, RL integration, and SFT data generation. +Exposes Hermes as an editor-native agent over stdio/JSON-RPC for VS Code, Zed, and JetBrains. -See: +→ [ACP Internals](./acp-internals.md) -- [Environments, Benchmarks & Data Generation](./environments.md) -- [Trajectories & Training Format](./trajectory-format.md) +### RL / Environments / Trajectories -## Design themes +Full environment framework for evaluation and RL training. Integrates with Atropos, supports multiple tool-call parsers, and generates ShareGPT-format trajectories. -Several cross-cutting design themes appear throughout the codebase: +→ [Environments, Benchmarks & Data Generation](./environments.md), [Trajectories & Training Format](./trajectory-format.md) -- prompt stability matters -- tool execution must be observable and interruptible -- session persistence must survive long-running use -- platform frontends should share one agent core -- optional subsystems should remain loosely coupled where possible +## Design Principles -## Implementation notes +| Principle | What it means in practice | +|-----------|--------------------------| +| **Prompt stability** | System prompt doesn't change mid-conversation. No cache-breaking mutations except explicit user actions (`/model`). | +| **Observable execution** | Every tool call is visible to the user via callbacks. Progress updates in CLI (spinner) and gateway (chat messages). | +| **Interruptible** | API calls and tool execution can be cancelled mid-flight by user input or signals. | +| **Platform-agnostic core** | One AIAgent class serves CLI, gateway, ACP, batch, and API server. Platform differences live in the entry point, not the agent. | +| **Loose coupling** | Optional subsystems (MCP, plugins, memory providers, RL environments) use registry patterns and check_fn gating, not hard dependencies. | +| **Profile isolation** | Each profile (`hermes -p `) gets its own HERMES_HOME, config, memory, sessions, and gateway PID. Multiple profiles run concurrently. | -The older mental model of Hermes as “one OpenAI-compatible chat loop plus some tools” is no longer sufficient. Current Hermes includes: +## File Dependency Chain -- multiple API modes -- auxiliary model routing -- ACP editor integration -- gateway-specific session and delivery semantics -- RL environment infrastructure -- prompt-caching and compression logic with lineage-aware persistence +```text +tools/registry.py (no deps — imported by all tool files) + ↑ +tools/*.py (each calls registry.register() at import time) + ↑ +model_tools.py (imports tools/registry + triggers tool discovery) + ↑ +run_agent.py, cli.py, batch_runner.py, environments/ +``` -Use this page as the map, then dive into subsystem-specific docs for the real implementation details. +This chain means tool registration happens at import time, before any agent instance is created. Adding a new tool requires an import in `model_tools.py`'s `_discover_tools()` list. diff --git a/website/docs/developer-guide/context-compression-and-caching.md b/website/docs/developer-guide/context-compression-and-caching.md index 970b89448..583844645 100644 --- a/website/docs/developer-guide/context-compression-and-caching.md +++ b/website/docs/developer-guide/context-compression-and-caching.md @@ -4,7 +4,7 @@ Hermes Agent uses a dual compression system and Anthropic prompt caching to manage context window usage efficiently across long conversations. Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`, -`gateway/run.py` (session hygiene), `run_agent.py` (lines 1146-1204) +`gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`) ## Dual Compression System @@ -26,7 +26,7 @@ Hermes has two separate compression layers that operate independently: ### 1. Gateway Session Hygiene (85% threshold) -Located in `gateway/run.py` (around line 2220). This is a **safety net** that +Located in `gateway/run.py` (search for `_maybe_compress_session`). This is a **safety net** that runs before the agent processes a message. It prevents API failures when sessions grow too large between turns (e.g., overnight accumulation in Telegram/Discord). diff --git a/website/docs/developer-guide/cron-internals.md b/website/docs/developer-guide/cron-internals.md index b47bc7bc1..060a8400f 100644 --- a/website/docs/developer-guide/cron-internals.md +++ b/website/docs/developer-guide/cron-internals.md @@ -6,85 +6,195 @@ description: "How Hermes stores, schedules, edits, pauses, skill-loads, and deli # Cron Internals -Hermes cron support is implemented primarily in: +The cron subsystem provides scheduled task execution — from simple one-shot delays to recurring cron-expression jobs with skill injection and cross-platform delivery. -- `cron/jobs.py` -- `cron/scheduler.py` -- `tools/cronjob_tools.py` -- `gateway/run.py` -- `hermes_cli/cron.py` +## Key Files -## Scheduling model +| File | Purpose | +|------|---------| +| `cron/jobs.py` | Job model, storage, atomic read/write to `jobs.json` | +| `cron/scheduler.py` | Scheduler loop — due-job detection, execution, repeat tracking | +| `tools/cronjob_tools.py` | Model-facing `cronjob` tool registration and handler | +| `gateway/run.py` | Gateway integration — cron ticking in the long-running loop | +| `hermes_cli/cron.py` | CLI `hermes cron` subcommands | -Hermes supports: +## Scheduling Model -- one-shot delays -- intervals -- cron expressions -- explicit timestamps +Four schedule formats are supported: -The model-facing surface is a single `cronjob` tool with action-style operations: +| Format | Example | Behavior | +|--------|---------|----------| +| **Relative delay** | `30m`, `2h`, `1d` | One-shot, fires after the specified duration | +| **Interval** | `every 2h`, `every 30m` | Recurring, fires at regular intervals | +| **Cron expression** | `0 9 * * *` | Standard 5-field cron syntax (minute, hour, day, month, weekday) | +| **ISO timestamp** | `2025-01-15T09:00:00` | One-shot, fires at the exact time | -- `create` -- `list` -- `update` -- `pause` -- `resume` -- `run` -- `remove` +The model-facing surface is a single `cronjob` tool with action-style operations: `create`, `list`, `update`, `pause`, `resume`, `run`, `remove`. -## Job storage +## Job Storage -Cron jobs are stored in Hermes-managed local state (`~/.hermes/cron/jobs.json`) with atomic write semantics. +Jobs are stored in `~/.hermes/cron/jobs.json` with atomic write semantics (write to temp file, then rename). Each job record contains: -Each job can carry: +```json +{ + "id": "job_abc123", + "name": "Daily briefing", + "prompt": "Summarize today's AI news and funding rounds", + "schedule": "0 9 * * *", + "skills": ["ai-funding-daily-report"], + "deliver": "telegram:-1001234567890", + "repeat": null, + "state": "scheduled", + "next_run": "2025-01-16T09:00:00Z", + "run_count": 42, + "created_at": "2025-01-01T00:00:00Z", + "model": null, + "provider": null, + "script": null +} +``` -- prompt -- schedule metadata -- repeat counters -- delivery target -- lifecycle state (`scheduled`, `paused`, `completed`, etc.) -- zero, one, or multiple attached skills +### Job Lifecycle States -Backward compatibility is preserved for older jobs that only stored a legacy single `skill` field or none of the newer lifecycle fields. +| State | Meaning | +|-------|---------| +| `scheduled` | Active, will fire at next scheduled time | +| `paused` | Suspended — won't fire until resumed | +| `completed` | Repeat count exhausted or one-shot that has fired | +| `running` | Currently executing (transient state) | -## Runtime behavior +### Backward Compatibility -The scheduler: +Older jobs may have a single `skill` field instead of the `skills` array. The scheduler normalizes this at load time — single `skill` is promoted to `skills: [skill]`. -- loads jobs -- computes due work -- executes jobs in fresh agent sessions -- optionally injects one or more skills before the prompt -- handles repeat counters -- updates next-run metadata and state +## Scheduler Runtime -In gateway mode, cron ticking is integrated into the long-running gateway loop. +### Tick Cycle -## Skill-backed jobs +The scheduler runs on a periodic tick (default: every 60 seconds): -A cron job may attach multiple skills. At runtime, Hermes loads those skills in order and then appends the job prompt as the task instruction. +```text +tick() + 1. Acquire scheduler lock (prevents overlapping ticks) + 2. Load all jobs from jobs.json + 3. Filter to due jobs (next_run <= now AND state == "scheduled") + 4. For each due job: + a. Set state to "running" + b. Create fresh AIAgent session (no conversation history) + c. Load attached skills in order (injected as user messages) + d. Run the job prompt through the agent + e. Deliver the response to the configured target + f. Update run_count, compute next_run + g. If repeat count exhausted → state = "completed" + h. Otherwise → state = "scheduled" + 5. Write updated jobs back to jobs.json + 6. Release scheduler lock +``` -This gives scheduled jobs reusable guidance without requiring the user to paste full skill bodies into the cron prompt. +### Gateway Integration -## Recursion guard +In gateway mode, the scheduler tick is integrated into the gateway's main event loop. The gateway calls `scheduler.tick()` on its periodic maintenance cycle, which runs alongside message handling. -Cron-run sessions disable the `cronjob` toolset. This prevents a scheduled job from recursively creating or mutating more cron jobs and accidentally exploding token usage or scheduler load. +In CLI mode, cron jobs only fire when `hermes cron` commands are run or during active CLI sessions. -## Delivery model +### Fresh Session Isolation -Cron jobs can deliver to: +Each cron job runs in a completely fresh agent session: -- origin chat -- local files -- platform home channels -- explicit platform/chat IDs +- No conversation history from previous runs +- No memory of previous cron executions (unless persisted to memory/files) +- The prompt must be self-contained — cron jobs cannot ask clarifying questions +- The `cronjob` toolset is disabled (recursion guard) + +## Skill-Backed Jobs + +A cron job can attach one or more skills via the `skills` field. At execution time: + +1. Skills are loaded in the specified order +2. Each skill's SKILL.md content is injected as context +3. The job's prompt is appended as the task instruction +4. The agent processes the combined skill context + prompt + +This enables reusable, tested workflows without pasting full instructions into cron prompts. For example: + +``` +Create a daily funding report → attach "ai-funding-daily-report" skill +``` + +### Script-Backed Jobs + +Jobs can also attach a Python script via the `script` field. The script runs *before* each agent turn, and its stdout is injected into the prompt as context. This enables data collection and change detection patterns: + +```python +# ~/.hermes/scripts/check_competitors.py +import requests, json +# Fetch competitor release notes, diff against last run +# Print summary to stdout — agent analyzes and reports +``` + +## Delivery Model + +Cron job results can be delivered to any supported platform: + +| Target | Syntax | Example | +|--------|--------|---------| +| Origin chat | `origin` | Deliver to the chat where the job was created | +| Local file | `local` | Save to `~/.hermes/cron/output/` | +| Telegram | `telegram` or `telegram:` | `telegram:-1001234567890` | +| Discord | `discord` or `discord:#channel` | `discord:#engineering` | +| Slack | `slack` | Deliver to Slack home channel | +| WhatsApp | `whatsapp` | Deliver to WhatsApp home | +| Signal | `signal` | Deliver to Signal | +| Matrix | `matrix` | Deliver to Matrix home room | +| Mattermost | `mattermost` | Deliver to Mattermost home | +| Email | `email` | Deliver via email | +| SMS | `sms` | Deliver via SMS | +| Home Assistant | `homeassistant` | Deliver to HA conversation | +| DingTalk | `dingtalk` | Deliver to DingTalk | +| Feishu | `feishu` | Deliver to Feishu | +| WeCom | `wecom` | Deliver to WeCom | + +For Telegram topics, use the format `telegram::` (e.g., `telegram:-1001234567890:17585`). + +### Response Wrapping + +By default (`cron.wrap_response: true`), cron deliveries are wrapped with: +- A header identifying the cron job name and task +- A footer noting the agent cannot see the delivered message in conversation + +The `[SILENT]` prefix in a cron response suppresses delivery entirely — useful for jobs that only need to write to files or perform side effects. + +### Session Isolation + +Cron deliveries are NOT mirrored into gateway session conversation history. They exist only in the cron job's own session. This prevents message alternation violations in the target chat's conversation. + +## Recursion Guard + +Cron-run sessions have the `cronjob` toolset disabled. This prevents: +- A scheduled job from creating new cron jobs +- Recursive scheduling that could explode token usage +- Accidental mutation of the job schedule from within a job ## Locking -Hermes uses lock-based protections so overlapping scheduler ticks do not execute the same due-job batch twice. +The scheduler uses file-based locking to prevent overlapping ticks from executing the same due-job batch twice. This is important in gateway mode where multiple maintenance cycles could overlap if a previous tick takes longer than the tick interval. -## Related docs +## CLI Interface -- [Cron feature guide](../user-guide/features/cron.md) +The `hermes cron` CLI provides direct job management: + +```bash +hermes cron list # Show all jobs +hermes cron add # Interactive job creation +hermes cron edit # Edit job configuration +hermes cron pause # Pause a running job +hermes cron resume # Resume a paused job +hermes cron run # Trigger immediate execution +hermes cron remove # Delete a job +``` + +## Related Docs + +- [Cron Feature Guide](/docs/user-guide/features/cron) - [Gateway Internals](./gateway-internals.md) +- [Agent Loop Internals](./agent-loop.md) diff --git a/website/docs/developer-guide/gateway-internals.md b/website/docs/developer-guide/gateway-internals.md index 5a8e9a594..f875c401f 100644 --- a/website/docs/developer-guide/gateway-internals.md +++ b/website/docs/developer-guide/gateway-internals.md @@ -6,106 +6,248 @@ description: "How the messaging gateway boots, authorizes users, routes sessions # Gateway Internals -The messaging gateway is the long-running process that connects Hermes to external platforms. +The messaging gateway is the long-running process that connects Hermes to 14+ external messaging platforms through a unified architecture. -Key files: +## Key Files -- `gateway/run.py` -- `gateway/config.py` -- `gateway/session.py` -- `gateway/delivery.py` -- `gateway/pairing.py` -- `gateway/channel_directory.py` -- `gateway/hooks.py` -- `gateway/mirror.py` -- `gateway/platforms/*` +| File | Purpose | +|------|---------| +| `gateway/run.py` | `GatewayRunner` — main loop, slash commands, message dispatch (~7,200 lines) | +| `gateway/session.py` | `SessionStore` — conversation persistence and session key construction | +| `gateway/delivery.py` | Outbound message delivery to target platforms/channels | +| `gateway/pairing.py` | DM pairing flow for user authorization | +| `gateway/channel_directory.py` | Maps chat IDs to human-readable names for cron delivery | +| `gateway/hooks.py` | Hook discovery, loading, and lifecycle event dispatch | +| `gateway/mirror.py` | Cross-session message mirroring for `send_message` | +| `gateway/status.py` | Token lock management for profile-scoped gateway instances | +| `gateway/builtin_hooks/` | Always-registered hooks (e.g., BOOT.md system prompt hook) | +| `gateway/platforms/` | Platform adapters (one per messaging platform) | -## Core responsibilities +## Architecture Overview -The gateway process is responsible for: +```text +┌─────────────────────────────────────────────────┐ +│ GatewayRunner │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ Telegram │ │ Discord │ │ Slack │ ... │ +│ │ Adapter │ │ Adapter │ │ Adapter │ │ +│ └─────┬─────┘ └─────┬────┘ └─────┬────┘ │ +│ │ │ │ │ +│ └──────────────┼──────────────┘ │ +│ ▼ │ +│ _handle_message() │ +│ │ │ +│ ┌────────────┼────────────┐ │ +│ ▼ ▼ ▼ │ +│ Slash command AIAgent Queue/BG │ +│ dispatch creation sessions │ +│ │ │ +│ ▼ │ +│ SessionStore │ +│ (SQLite persistence) │ +└─────────────────────────────────────────────────┘ +``` -- loading configuration from `.env`, `config.yaml`, and `gateway.json` -- starting platform adapters -- authorizing users -- routing incoming events to sessions -- maintaining per-chat session continuity -- dispatching messages to `AIAgent` -- running cron ticks and background maintenance tasks -- mirroring/proactively delivering output to configured channels +## Message Flow -## Config sources +When a message arrives from any platform: -The gateway has a multi-source config model: +1. **Platform adapter** receives raw event, normalizes it into a `MessageEvent` +2. **Base adapter** checks active session guard: + - If agent is running for this session → queue message, set interrupt event + - If `/approve`, `/deny`, `/stop` → bypass guard (dispatched inline) +3. **GatewayRunner._handle_message()** receives the event: + - Resolve session key via `_session_key_for_source()` (format: `agent:main:{platform}:{chat_type}:{chat_id}`) + - Check authorization (see Authorization below) + - Check if it's a slash command → dispatch to command handler + - Check if agent is already running → intercept commands like `/stop`, `/status` + - Otherwise → create `AIAgent` instance and run conversation +4. **Response** is sent back through the platform adapter -- environment variables -- `~/.hermes/gateway.json` -- selected bridged values from `~/.hermes/config.yaml` +### Session Key Format -## Session routing +Session keys encode the full routing context: -`gateway/session.py` and `GatewayRunner` cooperate to map incoming messages to active session IDs. +``` +agent:main:{platform}:{chat_type}:{chat_id} +``` -Session keying can depend on: +For example: `agent:main:telegram:private:123456789` -- platform -- user/chat identity -- thread/topic identity -- special platform-specific routing behavior +Thread-aware platforms (Telegram forum topics, Discord threads, Slack threads) may include thread IDs in the chat_id portion. **Never construct session keys manually** — always use `build_session_key()` from `gateway/session.py`. -## Authorization layers +### Two-Level Message Guard -The gateway can authorize through: +When an agent is actively running, incoming messages pass through two sequential guards: -- platform allowlists -- gateway-wide allowlists -- DM pairing flows -- explicit allow-all settings +1. **Level 1 — Base adapter** (`gateway/platforms/base.py`): Checks `_active_sessions`. If the session is active, queues the message in `_pending_messages` and sets an interrupt event. This catches messages *before* they reach the gateway runner. -Pairing support is implemented in `gateway/pairing.py`. +2. **Level 2 — Gateway runner** (`gateway/run.py`): Checks `_running_agents`. Intercepts specific commands (`/stop`, `/new`, `/queue`, `/status`, `/approve`, `/deny`) and routes them appropriately. Everything else triggers `running_agent.interrupt()`. -## Delivery path +Commands that must reach the runner while the agent is blocked (like `/approve`) are dispatched **inline** via `await self._message_handler(event)` — they bypass the background task system to avoid race conditions. -Outgoing deliveries are handled by `gateway/delivery.py`, which knows how to: +## Authorization -- deliver to a home channel -- resolve explicit targets -- mirror some remote deliveries back into local history/session tracking +The gateway uses a multi-layer authorization check, evaluated in order: + +1. **Gateway-wide allow-all** (`GATEWAY_ALLOW_ALL_USERS`) — if set, all users are authorized +2. **Platform allowlist** (e.g., `TELEGRAM_ALLOWED_USERS`) — comma-separated user IDs +3. **DM pairing** — authenticated users can pair new users via a pairing code +4. **Admin escalation** — some commands require admin status beyond basic authorization + +### DM Pairing Flow + +```text +Admin: /pair +Gateway: "Pairing code: ABC123. Share with the user." +New user: ABC123 +Gateway: "Paired! You're now authorized." +``` + +Pairing state is persisted in `gateway/pairing.py` and survives restarts. + +## Slash Command Dispatch + +All slash commands in the gateway flow through the same resolution pipeline: + +1. `resolve_command()` from `hermes_cli/commands.py` maps input to canonical name (handles aliases, prefix matching) +2. The canonical name is checked against `GATEWAY_KNOWN_COMMANDS` +3. Handler in `_handle_message()` dispatches based on canonical name +4. Some commands are gated on config (`gateway_config_gate` on `CommandDef`) + +### Running-Agent Guard + +Commands that must NOT execute while the agent is processing are rejected early: + +```python +if _quick_key in self._running_agents: + if canonical == "model": + return "⏳ Agent is running — wait for it to finish or /stop first." +``` + +Bypass commands (`/stop`, `/new`, `/approve`, `/deny`, `/queue`, `/status`) have special handling. + +## Config Sources + +The gateway reads configuration from multiple sources: + +| Source | What it provides | +|--------|-----------------| +| `~/.hermes/.env` | API keys, bot tokens, platform credentials | +| `~/.hermes/config.yaml` | Model settings, tool configuration, display options | +| Environment variables | Override any of the above | + +Unlike the CLI (which uses `load_cli_config()` with hardcoded defaults), the gateway reads `config.yaml` directly via YAML loader. This means config keys that exist in the CLI's defaults dict but not in the user's config file may behave differently between CLI and gateway. + +## Platform Adapters + +Each messaging platform has an adapter in `gateway/platforms/`: + +```text +gateway/platforms/ +├── base.py # BaseAdapter — shared logic for all platforms +├── telegram.py # Telegram Bot API (long polling or webhook) +├── discord.py # Discord bot via discord.py +├── slack.py # Slack Socket Mode +├── whatsapp.py # WhatsApp Business Cloud API +├── signal.py # Signal via signal-cli REST API +├── matrix.py # Matrix via matrix-nio (optional E2EE) +├── mattermost.py # Mattermost WebSocket API +├── email_adapter.py # Email via IMAP/SMTP +├── sms.py # SMS via Twilio +├── dingtalk.py # DingTalk WebSocket +├── feishu.py # Feishu/Lark WebSocket or webhook +├── wecom.py # WeCom (WeChat Work) callback +└── homeassistant.py # Home Assistant conversation integration +``` + +Adapters implement a common interface: +- `connect()` / `disconnect()` — lifecycle management +- `send_message()` — outbound message delivery +- `on_message()` — inbound message normalization → `MessageEvent` + +### Token Locks + +Adapters that connect with unique credentials call `acquire_scoped_lock()` in `connect()` and `release_scoped_lock()` in `disconnect()`. This prevents two profiles from using the same bot token simultaneously. + +## Delivery Path + +Outgoing deliveries (`gateway/delivery.py`) handle: + +- **Direct reply** — send response back to the originating chat +- **Home channel delivery** — route cron job outputs and background results to a configured home channel +- **Explicit target delivery** — `send_message` tool specifying `telegram:-1001234567890` +- **Cross-platform delivery** — deliver to a different platform than the originating message + +Cron job deliveries are NOT mirrored into gateway session history — they live in their own cron session only. This is a deliberate design choice to avoid message alternation violations. ## Hooks -Gateway events emit hook callbacks through `gateway/hooks.py`. Hooks are local trusted Python code and can observe or extend gateway lifecycle events. +Gateway hooks are Python modules that respond to lifecycle events: -## Background maintenance +### Gateway Hook Events -The gateway also runs maintenance tasks such as: +| Event | When fired | +|-------|-----------| +| `gateway:startup` | Gateway process starts | +| `session:start` | New conversation session begins | +| `session:end` | Session completes or times out | +| `session:reset` | User resets session with `/new` | +| `agent:start` | Agent begins processing a message | +| `agent:step` | Agent completes one tool-calling iteration | +| `agent:end` | Agent finishes and returns response | +| `command:*` | Any slash command is executed | -- cron ticking -- cache refreshes -- session expiry checks -- proactive memory flush before reset/expiry +Hooks are discovered from `gateway/builtin_hooks/` (always active) and `~/.hermes/hooks/` (user-installed). Each hook is a directory with a `HOOK.yaml` manifest and `handler.py`. -## Honcho interaction +## Memory Provider Integration -When a memory provider plugin (e.g. Honcho) is enabled, the gateway creates an AIAgent per incoming message with the same session ID. The memory provider's `initialize()` receives the session ID and creates the appropriate backend session. Tools are routed through the `MemoryManager`, which handles all provider lifecycle hooks (prefetch, sync, session end). +When a memory provider plugin (e.g., Honcho) is enabled: -### Memory provider session routing +1. Gateway creates an `AIAgent` per message with the session ID +2. The `MemoryManager` initializes the provider with the session context +3. Provider tools (e.g., `honcho_profile`, `viking_search`) are routed through: -Memory provider tools (e.g. `honcho_profile`, `viking_search`) are routed through the MemoryManager in `_invoke_tool()`: - -``` +```text AIAgent._invoke_tool() → self._memory_manager.handle_tool_call(name, args) → provider.handle_tool_call(name, args) ``` -Each memory provider manages its own session lifecycle internally. The `initialize()` method receives the session ID, and `on_session_end()` handles cleanup and final flush. +4. On session end/reset, `on_session_end()` fires for cleanup and final data flush -### Memory flush lifecycle +### Memory Flush Lifecycle -When a session is reset, resumed, or expires, the gateway flushes built-in memories before discarding context. The flush creates a temporary `AIAgent` that runs a memory-only conversation turn. The memory provider's `on_session_end()` hook fires during this process, giving external providers a chance to persist any buffered data. +When a session is reset, resumed, or expires: +1. Built-in memories are flushed to disk +2. Memory provider's `on_session_end()` hook fires +3. A temporary `AIAgent` runs a memory-only conversation turn +4. Context is then discarded or archived -## Related docs +## Background Maintenance + +The gateway runs periodic maintenance alongside message handling: + +- **Cron ticking** — checks job schedules and fires due jobs +- **Session expiry** — cleans up abandoned sessions after timeout +- **Memory flush** — proactively flushes memory before session expiry +- **Cache refresh** — refreshes model lists and provider status + +## Process Management + +The gateway runs as a long-lived process, managed via: + +- `hermes gateway start` / `hermes gateway stop` — manual control +- `systemctl` (Linux) or `launchctl` (macOS) — service management +- PID file at `~/.hermes/gateway.pid` — profile-scoped process tracking + +**Profile-scoped vs global**: `start_gateway()` uses profile-scoped PID files. `hermes gateway stop` stops only the current profile's gateway. `hermes gateway stop --all` uses global `ps aux` scanning to kill all gateway processes (used during updates). + +## Related Docs - [Session Storage](./session-storage.md) - [Cron Internals](./cron-internals.md) - [ACP Internals](./acp-internals.md) +- [Agent Loop Internals](./agent-loop.md) +- [Messaging Gateway (User Guide)](/docs/user-guide/messaging) diff --git a/website/docs/developer-guide/trajectory-format.md b/website/docs/developer-guide/trajectory-format.md index f36244ed2..c23838357 100644 --- a/website/docs/developer-guide/trajectory-format.md +++ b/website/docs/developer-guide/trajectory-format.md @@ -3,7 +3,7 @@ Hermes Agent saves conversation trajectories in ShareGPT-compatible JSONL format for use as training data, debugging artifacts, and reinforcement learning datasets. -Source files: `agent/trajectory.py`, `run_agent.py` (lines 1788-1975), `batch_runner.py` +Source files: `agent/trajectory.py`, `run_agent.py` (search for `_save_trajectory`), `batch_runner.py` ## File Naming Convention diff --git a/website/docs/index.md b/website/docs/index.md index 470c8d2ed..f4b5378f4 100644 --- a/website/docs/index.md +++ b/website/docs/index.md @@ -28,7 +28,7 @@ It's not a coding copilot tethered to an IDE or a chatbot wrapper around a singl | 🗺️ **[Learning Path](/docs/getting-started/learning-path)** | Find the right docs for your experience level | | ⚙️ **[Configuration](/docs/user-guide/configuration)** | Config file, providers, models, and options | | 💬 **[Messaging Gateway](/docs/user-guide/messaging)** | Set up Telegram, Discord, Slack, or WhatsApp | -| 🔧 **[Tools & Toolsets](/docs/user-guide/features/tools)** | 40+ built-in tools and how to configure them | +| 🔧 **[Tools & Toolsets](/docs/user-guide/features/tools)** | 47 built-in tools and how to configure them | | 🧠 **[Memory System](/docs/user-guide/features/memory)** | Persistent memory that grows across sessions | | 📚 **[Skills System](/docs/user-guide/features/skills)** | Procedural memory the agent creates and reuses | | 🔌 **[MCP Integration](/docs/user-guide/features/mcp)** | Connect to MCP servers, filter their tools, and extend Hermes safely | @@ -46,7 +46,7 @@ It's not a coding copilot tethered to an IDE or a chatbot wrapper around a singl - **A closed learning loop** — Agent-curated memory with periodic nudges, autonomous skill creation, skill self-improvement during use, FTS5 cross-session recall with LLM summarization, and [Honcho](https://github.com/plastic-labs/honcho) dialectic user modeling - **Runs anywhere, not just your laptop** — 6 terminal backends: local, Docker, SSH, Daytona, Singularity, Modal. Daytona and Modal offer serverless persistence — your environment hibernates when idle, costing nearly nothing -- **Lives where you do** — CLI, Telegram, Discord, Slack, WhatsApp, all from one gateway +- **Lives where you do** — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Email, SMS, DingTalk, Feishu, WeCom, Home Assistant — 14+ platforms from one gateway - **Built by model trainers** — Created by [Nous Research](https://nousresearch.com), the lab behind Hermes, Nomos, and Psyche. Works with [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai), OpenAI, or any endpoint - **Scheduled automations** — Built-in cron with delivery to any platform - **Delegates & parallelizes** — Spawn isolated subagents for parallel workstreams. Programmatic Tool Calling via `execute_code` collapses multi-step pipelines into single inference calls diff --git a/website/docs/integrations/index.md b/website/docs/integrations/index.md index cbd771072..ce103f1cc 100644 --- a/website/docs/integrations/index.md +++ b/website/docs/integrations/index.md @@ -22,7 +22,7 @@ Hermes supports multiple AI inference providers out of the box. Use `hermes mode ## Web Search Backends -The `web_search`, `web_extract`, and `web_crawl` tools support four backend providers, configured via `config.yaml` or `hermes tools`: +The `web_search` and `web_extract` tools support four backend providers, configured via `config.yaml` or `hermes tools`: | Backend | Env Var | Search | Extract | Crawl | |---------|---------|--------|---------|-------| @@ -56,13 +56,14 @@ See [Browser Automation](/docs/user-guide/features/browser) for setup and usage. Text-to-speech and speech-to-text across all messaging platforms: | Provider | Quality | Cost | API Key | -|----------|---------|------|---------| -| **Edge TTS** (default) | Good | Free | None needed | -| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` | -| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` | -| **NeuTTS** | Good | Free | None needed | +||----------|---------|------|---------| +|| **Edge TTS** (default) | Good | Free | None needed | +|| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` | +|| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` | +|| **MiniMax** | Good | Paid | `MINIMAX_API_KEY` | +|| **NeuTTS** | Good | Free | None needed | -Speech-to-text uses Whisper for voice message transcription on Telegram, Discord, and WhatsApp. See [Voice & TTS](/docs/user-guide/features/tts) and [Voice Mode](/docs/user-guide/features/voice-mode) for details. +Speech-to-text supports three providers: local Whisper (free, runs on-device), Groq (fast cloud), and OpenAI Whisper API. Voice message transcription works across Telegram, Discord, WhatsApp, and other messaging platforms. See [Voice & TTS](/docs/user-guide/features/tts) and [Voice Mode](/docs/user-guide/features/voice-mode) for details. ## IDE & Editor Integration @@ -74,9 +75,27 @@ Speech-to-text uses Whisper for voice message transcription on Telegram, Discord ## Memory & Personalization -- **[Honcho Memory](/docs/user-guide/features/honcho)** — AI-native persistent memory for cross-session user modeling and personalization. Honcho adds deep user modeling via dialectic reasoning on top of Hermes's built-in memory system. +- **[Built-in Memory](/docs/user-guide/features/memory)** — Persistent, curated memory via `MEMORY.md` and `USER.md` files. The agent maintains bounded stores of personal notes and user profile data that survive across sessions. +- **[Memory Providers](/docs/user-guide/features/memory-providers)** — Plug in external memory backends for deeper personalization. Seven providers are supported: Honcho (dialectic reasoning), OpenViking (tiered retrieval), Mem0 (cloud extraction), Hindsight (knowledge graphs), Holographic (local SQLite), RetainDB (hybrid search), and ByteRover (CLI-based). + +## Messaging Platforms + +Hermes runs as a gateway bot on 14+ messaging platforms, all configured through the same `gateway` subsystem: + +- **[Telegram](/docs/user-guide/messaging/telegram)**, **[Discord](/docs/user-guide/messaging/discord)**, **[Slack](/docs/user-guide/messaging/slack)**, **[WhatsApp](/docs/user-guide/messaging/whatsapp)**, **[Signal](/docs/user-guide/messaging/signal)**, **[Matrix](/docs/user-guide/messaging/matrix)**, **[Mattermost](/docs/user-guide/messaging/mattermost)**, **[Email](/docs/user-guide/messaging/email)**, **[SMS](/docs/user-guide/messaging/sms)**, **[DingTalk](/docs/user-guide/messaging/dingtalk)**, **[Feishu/Lark](/docs/user-guide/messaging/feishu)**, **[WeCom](/docs/user-guide/messaging/wecom)**, **[Home Assistant](/docs/user-guide/messaging/homeassistant)**, **[Webhooks](/docs/user-guide/messaging/webhooks)** + +See the [Messaging Gateway overview](/docs/user-guide/messaging) for the platform comparison table and setup guide. + +## Home Automation + +- **[Home Assistant](/docs/user-guide/messaging/homeassistant)** — Control smart home devices via four dedicated tools (`ha_list_entities`, `ha_get_state`, `ha_list_services`, `ha_call_service`). The Home Assistant toolset activates automatically when `HASS_TOKEN` is configured. + +## Plugins + +- **[Plugin System](/docs/user-guide/features/plugins)** — Extend Hermes with custom tools, lifecycle hooks, and CLI commands without modifying core code. Plugins are discovered from `~/.hermes/plugins/`, project-local `.hermes/plugins/`, and pip-installed entry points. +- **[Build a Plugin](/docs/guides/build-a-hermes-plugin)** — Step-by-step guide for creating Hermes plugins with tools, hooks, and CLI commands. ## Training & Evaluation -- **[RL Training](/docs/user-guide/features/rl-training)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning. +- **[RL Training](/docs/user-guide/features/rl-training)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning. Supports Atropos environments with customizable reward functions. - **[Batch Processing](/docs/user-guide/features/batch-processing)** — Run the agent across hundreds of prompts in parallel, generating structured ShareGPT-format trajectory data for training data generation or evaluation. diff --git a/website/docs/reference/faq.md b/website/docs/reference/faq.md index fafb19655..e8e6fe435 100644 --- a/website/docs/reference/faq.md +++ b/website/docs/reference/faq.md @@ -90,7 +90,7 @@ Both persist across sessions. See [Memory](../user-guide/features/memory.md) and Yes. Import the `AIAgent` class and use Hermes programmatically: ```python -from hermes.agent import AIAgent +from run_agent import AIAgent agent = AIAgent(model="openrouter/nous/hermes-3-llama-3.1-70b") response = agent.chat("Explain quantum computing briefly") @@ -227,7 +227,7 @@ hermes chat --model openrouter/meta-llama/llama-3.1-70b-instruct hermes chat # Use a model with a larger context window -hermes chat --model openrouter/google/gemini-2.0-flash-001 +hermes chat --model openrouter/google/gemini-3-flash-preview ``` If this happens on the first long conversation, Hermes may have the wrong context length for your model. Check what it detected: diff --git a/website/docs/reference/optional-skills-catalog.md b/website/docs/reference/optional-skills-catalog.md index 9b7c1c683..18ec4b381 100644 --- a/website/docs/reference/optional-skills-catalog.md +++ b/website/docs/reference/optional-skills-catalog.md @@ -1,74 +1,153 @@ --- -sidebar_position: 6 -title: "Official Optional Skills Catalog" -description: "Catalog of official optional skills available from the repository" +sidebar_position: 9 +title: "Optional Skills Catalog" +description: "Official optional skills shipped with hermes-agent — install via hermes skills install official//" --- -# Official Optional Skills Catalog +# Optional Skills Catalog -Official optional skills live in the repository under `optional-skills/`. Install them with `hermes skills install official//` or browse them with `hermes skills browse --source official`. +Official optional skills ship with the hermes-agent repository under `optional-skills/` but are **not active by default**. Install them explicitly: -## autonomous-ai-agents +```bash +hermes skills install official// +``` -| Skill | Description | Path | -|-------|-------------|------| -| `blackbox` | Delegate coding tasks to Blackbox AI CLI agent. Multi-model agent with built-in judge that runs tasks through multiple LLMs and picks the best result. Requires the blackbox CLI and a Blackbox AI API key. | `autonomous-ai-agents/blackbox` | +For example: -## blockchain +```bash +hermes skills install official/blockchain/solana +hermes skills install official/mlops/flash-attention +``` -| Skill | Description | Path | -|-------|-------------|------| -| `base` | Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection. | `blockchain/base` | -| `solana` | Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network stats. Uses Solana RPC + CoinGecko. No API key required. | `blockchain/solana` | +Once installed, the skill appears in the agent's skill list and can be loaded automatically when relevant tasks are detected. -## creative +To uninstall: -| Skill | Description | Path | -|-------|-------------|------| -| `blender-mcp` | Control Blender directly from Hermes via socket connection to the blender-mcp addon. Create 3D objects, materials, animations, and run arbitrary Blender Python. | `creative/blender-mcp` | -| `meme-generation` | Generate real meme images by picking a template and overlaying text with Pillow. Produces actual .png meme files. | `creative/meme-generation` | +```bash +hermes skills uninstall +``` -## email +--- -| Skill | Description | Path | -|-------|-------------|------| -| `agentmail` | Give the agent its own dedicated email inbox via AgentMail. Send, receive, and manage email autonomously using agent-owned email addresses (e.g. hermes-agent@agentmail.to). | `email/agentmail` | +## Autonomous AI Agents -## health +| Skill | Description | +|-------|-------------| +| **blackbox** | Delegate coding tasks to Blackbox AI CLI agent. Multi-model agent with built-in judge that runs tasks through multiple LLMs and picks the best result. | +| **honcho** | Configure and use Honcho memory with Hermes — cross-session user modeling, multi-profile peer isolation, observation config, and dialectic reasoning. | -| Skill | Description | Path | -|-------|-------------|------| -| `neuroskill-bci` | Connect to a running NeuroSkill instance and incorporate the user's real-time cognitive and emotional state (focus, relaxation, mood, cognitive load, drowsiness, heart rate, HRV, sleep staging, and 40+ derived EXG scores) into responses. Requires a BCI wearable (Muse 2/S or Open… | `health/neuroskill-bci` | +## Blockchain -## mcp +| Skill | Description | +|-------|-------------| +| **base** | Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection, whale detection, and live network stats. No API key required. | +| **solana** | Query Solana blockchain data with USD pricing — wallet balances, token portfolios, transaction details, NFTs, whale detection, and live network stats. No API key required. | -| Skill | Description | Path | -|-------|-------------|------| -| `fastmcp` | Build, test, inspect, install, and deploy MCP servers with FastMCP in Python. | `mcp/fastmcp` | +## Communication -## migration +| Skill | Description | +|-------|-------------| +| **one-three-one-rule** | Structured communication framework for proposals and decision-making. | -| Skill | Description | Path | -|-------|-------------|------| -| `openclaw-migration` | Migrate a user's OpenClaw customization footprint into Hermes Agent. Imports Hermes-compatible memories, SOUL.md, command allowlists, user skills, and selected workspace assets from ~/.openclaw, then reports exactly what could not be migrated and why. | `migration/openclaw-migration` | +## Creative -## productivity +| Skill | Description | +|-------|-------------| +| **blender-mcp** | Control Blender directly from Hermes via socket connection to the blender-mcp addon. Create 3D objects, materials, animations, and run arbitrary Blender Python (bpy) code. | +| **meme-generation** | Generate real meme images by picking a template and overlaying text with Pillow. Produces actual `.png` meme files. | -| Skill | Description | Path | -|-------|-------------|------| -| `telephony` | Give Hermes phone capabilities — provision a Twilio number, send/receive SMS/MMS, make direct calls, and place AI-driven outbound calls through Bland.ai or Vapi. | `productivity/telephony` | +## DevOps -## research +| Skill | Description | +|-------|-------------| +| **cli** | Run 150+ AI apps via inference.sh CLI (infsh) — image generation, video creation, LLMs, search, 3D, and social automation. | +| **docker-management** | Manage Docker containers, images, volumes, networks, and Compose stacks — lifecycle ops, debugging, cleanup, and Dockerfile optimization. | -| Skill | Description | Path | -|-------|-------------|------| -| `bioinformatics` | Gateway to 400+ bioinformatics skills from bioSkills and ClawBio. Covers genomics, transcriptomics, single-cell, variant calling, pharmacogenomics, metagenomics, structural biology. | `research/bioinformatics` | -| `qmd` | Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration. | `research/qmd` | +## Email -## security +| Skill | Description | +|-------|-------------| +| **agentmail** | Give the agent its own dedicated email inbox via AgentMail. Send, receive, and manage email autonomously using agent-owned email addresses. | -| Skill | Description | Path | -|-------|-------------|------| -| `1password` | Set up and use 1Password CLI (op). Use when installing the CLI, enabling desktop app integration, signing in, and reading/injecting secrets for commands. | `security/1password` | -| `oss-forensics` | Supply chain investigation, evidence recovery, and forensic analysis for GitHub repositories. Covers deleted commit recovery, force-push detection, IOC extraction. | `security/oss-forensics` | -| `sherlock` | OSINT username search across 400+ social networks. Hunt down social media accounts by username. | `security/sherlock` | +## Health + +| Skill | Description | +|-------|-------------| +| **neuroskill-bci** | Brain-Computer Interface (BCI) integration for neuroscience research workflows. | + +## MCP + +| Skill | Description | +|-------|-------------| +| **fastmcp** | Build, test, inspect, install, and deploy MCP servers with FastMCP in Python. Covers wrapping APIs or databases as MCP tools, exposing resources or prompts, and deployment. | + +## Migration + +| Skill | Description | +|-------|-------------| +| **openclaw-migration** | Migrate a user's OpenClaw customization footprint into Hermes Agent. Imports memories, SOUL.md, command allowlists, user skills, and selected workspace assets. | + +## MLOps + +The largest optional category — covers the full ML pipeline from data curation to production inference. + +| Skill | Description | +|-------|-------------| +| **accelerate** | Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. | +| **chroma** | Open-source embedding database. Store embeddings and metadata, perform vector and full-text search. Simple 4-function API for RAG and semantic search. | +| **faiss** | Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). | +| **flash-attention** | Optimize transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Supports PyTorch SDPA, flash-attn library, H100 FP8, and sliding window. | +| **hermes-atropos-environments** | Build, test, and debug Hermes Agent RL environments for Atropos training. Covers the HermesAgentBaseEnv interface, reward functions, agent loop integration, and evaluation. | +| **huggingface-tokenizers** | Fast Rust-based tokenizers for research and production. Tokenizes 1GB in under 20 seconds. Supports BPE, WordPiece, and Unigram algorithms. | +| **instructor** | Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, and stream partial results. | +| **lambda-labs** | Reserved and on-demand GPU cloud instances for ML training and inference. SSH access, persistent filesystems, and multi-node clusters. | +| **llava** | Large Language and Vision Assistant — visual instruction tuning and image-based conversations combining CLIP vision with LLaMA language models. | +| **nemo-curator** | GPU-accelerated data curation for LLM training. Fuzzy deduplication (16x faster), quality filtering (30+ heuristics), semantic dedup, PII redaction. Scales with RAPIDS. | +| **pinecone** | Managed vector database for production AI. Auto-scaling, hybrid search (dense + sparse), metadata filtering, and low latency (under 100ms p95). | +| **pytorch-lightning** | High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks, and minimal boilerplate. | +| **qdrant** | High-performance vector similarity search engine. Rust-powered with fast nearest neighbor search, hybrid search with filtering, and scalable vector storage. | +| **saelens** | Train and analyze Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. | +| **simpo** | Simple Preference Optimization — reference-free alternative to DPO with better performance (+6.4 pts on AlpacaEval 2.0). No reference model needed. | +| **slime** | LLM post-training with RL using Megatron+SGLang framework. Custom data generation workflows and tight Megatron-LM integration for RL scaling. | +| **tensorrt-llm** | Optimize LLM inference with NVIDIA TensorRT for maximum throughput. 10-100x faster than PyTorch on A100/H100 with quantization (FP8/INT4) and in-flight batching. | +| **torchtitan** | PyTorch-native distributed LLM pretraining with 4D parallelism (FSDP2, TP, PP, CP). Scale from 8 to 512+ GPUs with Float8 and torch.compile. | + +## Productivity + +| Skill | Description | +|-------|-------------| +| **canvas** | Canvas LMS integration — fetch enrolled courses and assignments using API token authentication. | +| **memento-flashcards** | Spaced repetition flashcard system for learning and knowledge retention. | +| **siyuan** | SiYuan Note API for searching, reading, creating, and managing blocks and documents in a self-hosted knowledge base. | +| **telephony** | Give Hermes phone capabilities — provision a Twilio number, send/receive SMS/MMS, make calls, and place AI-driven outbound calls through Bland.ai or Vapi. | + +## Research + +| Skill | Description | +|-------|-------------| +| **bioinformatics** | Gateway to 400+ bioinformatics skills from bioSkills and ClawBio. Covers genomics, transcriptomics, single-cell, variant calling, pharmacogenomics, metagenomics, and structural biology. | +| **domain-intel** | Passive domain reconnaissance using Python stdlib. Subdomain discovery, SSL certificate inspection, WHOIS lookups, DNS records, and bulk multi-domain analysis. No API keys required. | +| **duckduckgo-search** | Free web search via DuckDuckGo — text, news, images, videos. No API key needed. | +| **gitnexus-explorer** | Index a codebase with GitNexus and serve an interactive knowledge graph via web UI and Cloudflare tunnel. | +| **parallel-cli** | Vendor skill for Parallel CLI — agent-native web search, extraction, deep research, enrichment, and monitoring. | +| **qmd** | Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. | +| **scrapling** | Web scraping with Scrapling — HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python. | + +## Security + +| Skill | Description | +|-------|-------------| +| **1password** | Set up and use 1Password CLI (op). Install the CLI, enable desktop app integration, sign in, and read/inject secrets for commands. | +| **oss-forensics** | Open-source software forensics — analyze packages, dependencies, and supply chain risks. | +| **sherlock** | OSINT username search across 400+ social networks. Hunt down social media accounts by username. | + +--- + +## Contributing Optional Skills + +To add a new optional skill to the repository: + +1. Create a directory under `optional-skills///` +2. Add a `SKILL.md` with standard frontmatter (name, description, version, author) +3. Include any supporting files in `references/`, `templates/`, or `scripts/` subdirectories +4. Submit a pull request — the skill will appear in this catalog once merged diff --git a/website/docs/reference/slash-commands.md b/website/docs/reference/slash-commands.md index 1aa88fd49..f750e7e7d 100644 --- a/website/docs/reference/slash-commands.md +++ b/website/docs/reference/slash-commands.md @@ -89,9 +89,22 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in | `/` | Load any installed skill as an on-demand command. Example: `/gif-search`, `/github-pr-workflow`, `/excalidraw`. | | `/skills ...` | Search, browse, inspect, install, audit, publish, and configure skills from registries and the official optional-skills catalog. | -### Quick commands +### Quick Commands -User-defined quick commands from `quick_commands` in `~/.hermes/config.yaml` are also available as slash commands. These are resolved at dispatch time, not shown in the built-in autocomplete/help tables. +User-defined quick commands map a short alias to a longer prompt. Configure them in `~/.hermes/config.yaml`: + +```yaml +quick_commands: + review: "Review my latest git diff and suggest improvements" + deploy: "Run the deployment script at scripts/deploy.sh and verify the output" + morning: "Check my calendar, unread emails, and summarize today's priorities" +``` + +Then type `/review`, `/deploy`, or `/morning` in the CLI. Quick commands are resolved at dispatch time and are not shown in the built-in autocomplete/help tables. + +### Alias Resolution + +Commands support prefix matching: typing `/h` resolves to `/help`, `/mod` resolves to `/model`. When a prefix is ambiguous (matches multiple commands), the first match in registry order wins. Full command names and registered aliases always take priority over prefix matches. ## Messaging slash commands diff --git a/website/docs/reference/tools-reference.md b/website/docs/reference/tools-reference.md index c31fd57cf..5353ca5ff 100644 --- a/website/docs/reference/tools-reference.md +++ b/website/docs/reference/tools-reference.md @@ -6,7 +6,13 @@ description: "Authoritative reference for Hermes built-in tools, grouped by tool # Built-in Tools Reference -This page documents the built-in Hermes tool registry as it exists in code. Availability can still vary by platform, credentials, and enabled toolsets. +This page documents all 47 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets. + +**Quick counts:** 11 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, and 14 standalone tools across other toolsets. + +:::tip MCP Tools +In addition to built-in tools, Hermes can load tools dynamically from MCP servers. MCP tools appear with a server-name prefix (e.g., `github_create_issue` for the `github` MCP server). See [MCP Integration](/docs/user-guide/features/mcp) for configuration. +::: ## `browser` toolset diff --git a/website/docs/reference/toolsets-reference.md b/website/docs/reference/toolsets-reference.md index d75b9162b..19ff00a3f 100644 --- a/website/docs/reference/toolsets-reference.md +++ b/website/docs/reference/toolsets-reference.md @@ -6,53 +6,150 @@ description: "Reference for Hermes core, composite, platform, and dynamic toolse # Toolsets Reference -Toolsets are named bundles of tools that you can enable with `hermes chat --toolsets ...`, configure per platform, or resolve inside the agent runtime. +Toolsets are named bundles of tools that control what the agent can do. They're the primary mechanism for configuring tool availability per platform, per session, or per task. -| Toolset | Kind | Resolves to | -|---------|------|-------------| -| `browser` | core | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `web_search` | -| `clarify` | core | `clarify` | -| `code_execution` | core | `execute_code` | -| `cronjob` | core | `cronjob` | -| `debugging` | composite | `patch`, `process`, `read_file`, `search_files`, `terminal`, `web_extract`, `web_search`, `write_file` | -| `delegation` | core | `delegate_task` | -| `file` | core | `patch`, `read_file`, `search_files`, `write_file` | -| `hermes-acp` | platform | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `delegate_task`, `execute_code`, `memory`, `patch`, `process`, `read_file`, `search_files`, `session_search`, `skill_manage`, `skill_view`, `skills_list`, `terminal`, `todo`, `vision_analyze`, `web_extract`, `web_search`, `write_file` | -| `hermes-cli` | platform | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `clarify`, `cronjob`, `delegate_task`, `execute_code`, `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services`, `image_generate`, `memory`, `mixture_of_agents`, `patch`, `process`, `read_file`, `search_files`, `send_message`, `session_search`, `skill_manage`, `skill_view`, `skills_list`, `terminal`, `text_to_speech`, `todo`, `vision_analyze`, `web_extract`, `web_search`, `write_file` | -| `hermes-api-server` | platform | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `cronjob`, `delegate_task`, `execute_code`, `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services`, `image_generate`, `memory`, `mixture_of_agents`, `patch`, `process`, `read_file`, `search_files`, `session_search`, `skill_manage`, `skill_view`, `skills_list`, `terminal`, `todo`, `vision_analyze`, `web_extract`, `web_search`, `write_file` | -| `hermes-dingtalk` | platform | _(same as hermes-cli)_ | -| `hermes-feishu` | platform | _(same as hermes-cli)_ | -| `hermes-wecom` | platform | _(same as hermes-cli)_ | -| `hermes-discord` | platform | _(same as hermes-cli)_ | -| `hermes-email` | platform | _(same as hermes-cli)_ | -| `hermes-gateway` | composite | Union of all messaging platform toolsets | -| `hermes-homeassistant` | platform | _(same as hermes-cli)_ | -| `hermes-matrix` | platform | _(same as hermes-cli)_ | -| `hermes-mattermost` | platform | _(same as hermes-cli)_ | -| `hermes-signal` | platform | _(same as hermes-cli)_ | -| `hermes-slack` | platform | _(same as hermes-cli)_ | -| `hermes-sms` | platform | _(same as hermes-cli)_ | -| `hermes-telegram` | platform | _(same as hermes-cli)_ | -| `hermes-whatsapp` | platform | _(same as hermes-cli)_ | -| `hermes-webhook` | platform | _(same as hermes-cli)_ | -| `homeassistant` | core | `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services` | -| `image_gen` | core | `image_generate` | -| `memory` | core | `memory` | -| `messaging` | core | `send_message` | -| `moa` | core | `mixture_of_agents` | -| `rl` | core | `rl_check_status`, `rl_edit_config`, `rl_get_current_config`, `rl_get_results`, `rl_list_environments`, `rl_list_runs`, `rl_select_environment`, `rl_start_training`, `rl_stop_training`, `rl_test_inference` | -| `safe` | composite | `image_generate`, `mixture_of_agents`, `vision_analyze`, `web_extract`, `web_search` | -| `search` | core | `web_search` | -| `session_search` | core | `session_search` | -| `skills` | core | `skill_manage`, `skill_view`, `skills_list` | -| `terminal` | core | `process`, `terminal` | -| `todo` | core | `todo` | -| `tts` | core | `text_to_speech` | -| `vision` | core | `vision_analyze` | -| `web` | core | `web_extract`, `web_search` | +## How Toolsets Work -## Dynamic toolsets +Every tool belongs to exactly one toolset. When you enable a toolset, all tools in that bundle become available to the agent. Toolsets come in three kinds: -- `mcp-` — generated at runtime for each configured MCP server. -- Custom toolsets can be created in configuration and resolved at startup. -- Wildcards: `all` and `*` expand to every registered toolset. \ No newline at end of file +- **Core** — A single logical group of related tools (e.g., `file` bundles `read_file`, `write_file`, `patch`, `search_files`) +- **Composite** — Combines multiple core toolsets for a common scenario (e.g., `debugging` bundles file, terminal, and web tools) +- **Platform** — A complete tool configuration for a specific deployment context (e.g., `hermes-cli` is the default for interactive CLI sessions) + +## Configuring Toolsets + +### Per-session (CLI) + +```bash +hermes chat --toolsets web,file,terminal +hermes chat --toolsets debugging # composite — expands to file + terminal + web +hermes chat --toolsets all # everything +``` + +### Per-platform (config.yaml) + +```yaml +toolsets: + - hermes-cli # default for CLI + # - hermes-telegram # override for Telegram gateway +``` + +### Interactive management + +```bash +hermes tools # curses UI to enable/disable per platform +``` + +Or in-session: + +``` +/tools list +/tools disable browser +/tools enable rl +``` + +## Core Toolsets + +| Toolset | Tools | Purpose | +|---------|-------|---------| +| `browser` | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `web_search` | Full browser automation. Includes `web_search` as a fallback for quick lookups. | +| `clarify` | `clarify` | Ask the user a question when the agent needs clarification. | +| `code_execution` | `execute_code` | Run Python scripts that call Hermes tools programmatically. | +| `cronjob` | `cronjob` | Schedule and manage recurring tasks. | +| `delegation` | `delegate_task` | Spawn isolated subagent instances for parallel work. | +| `file` | `patch`, `read_file`, `search_files`, `write_file` | File reading, writing, searching, and editing. | +| `homeassistant` | `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services` | Smart home control via Home Assistant. Only available when `HASS_TOKEN` is set. | +| `image_gen` | `image_generate` | Text-to-image generation via FAL.ai. | +| `memory` | `memory` | Persistent cross-session memory management. | +| `messaging` | `send_message` | Send messages to other platforms (Telegram, Discord, etc.) from within a session. | +| `moa` | `mixture_of_agents` | Multi-model consensus via Mixture of Agents. | +| `rl` | `rl_check_status`, `rl_edit_config`, `rl_get_current_config`, `rl_get_results`, `rl_list_environments`, `rl_list_runs`, `rl_select_environment`, `rl_start_training`, `rl_stop_training`, `rl_test_inference` | RL training environment management (Atropos). | +| `search` | `web_search` | Web search only (without extract). | +| `session_search` | `session_search` | Search past conversation sessions. | +| `skills` | `skill_manage`, `skill_view`, `skills_list` | Skill CRUD and browsing. | +| `terminal` | `process`, `terminal` | Shell command execution and background process management. | +| `todo` | `todo` | Task list management within a session. | +| `tts` | `text_to_speech` | Text-to-speech audio generation. | +| `vision` | `vision_analyze` | Image analysis via vision-capable models. | +| `web` | `web_extract`, `web_search` | Web search and page content extraction. | + +## Composite Toolsets + +These expand to multiple core toolsets, providing a convenient shorthand for common scenarios: + +| Toolset | Expands to | Use case | +|---------|-----------|----------| +| `debugging` | `patch`, `process`, `read_file`, `search_files`, `terminal`, `web_extract`, `web_search`, `write_file` | Debug sessions — file access, terminal, and web research without browser or delegation overhead. | +| `safe` | `image_generate`, `mixture_of_agents`, `vision_analyze`, `web_extract`, `web_search` | Read-only research and media generation. No file writes, no terminal access, no code execution. Good for untrusted or constrained environments. | + +## Platform Toolsets + +Platform toolsets define the complete tool configuration for a deployment target. Most messaging platforms use the same set as `hermes-cli`: + +| Toolset | Differences from `hermes-cli` | +|---------|-------------------------------| +| `hermes-cli` | Full toolset — all 39 tools including `clarify`. The default for interactive CLI sessions. | +| `hermes-acp` | Drops `clarify`, `cronjob`, `image_generate`, `mixture_of_agents`, `send_message`, `text_to_speech`, homeassistant tools. Focused on coding tasks in IDE context. | +| `hermes-api-server` | Drops `clarify` and `send_message`. Adds everything else — suitable for programmatic access where user interaction isn't possible. | +| `hermes-telegram` | Same as `hermes-cli`. | +| `hermes-discord` | Same as `hermes-cli`. | +| `hermes-slack` | Same as `hermes-cli`. | +| `hermes-whatsapp` | Same as `hermes-cli`. | +| `hermes-signal` | Same as `hermes-cli`. | +| `hermes-matrix` | Same as `hermes-cli`. | +| `hermes-mattermost` | Same as `hermes-cli`. | +| `hermes-email` | Same as `hermes-cli`. | +| `hermes-sms` | Same as `hermes-cli`. | +| `hermes-dingtalk` | Same as `hermes-cli`. | +| `hermes-feishu` | Same as `hermes-cli`. | +| `hermes-wecom` | Same as `hermes-cli`. | +| `hermes-homeassistant` | Same as `hermes-cli`. | +| `hermes-webhook` | Same as `hermes-cli`. | +| `hermes-gateway` | Union of all messaging platform toolsets. Used internally when the gateway needs the broadest possible tool set. | + +## Dynamic Toolsets + +### MCP server toolsets + +Each configured MCP server generates a `mcp-` toolset at runtime. For example, if you configure a `github` MCP server, a `mcp-github` toolset is created containing all tools that server exposes. + +```yaml +# config.yaml +mcp: + servers: + github: + command: npx + args: ["-y", "@modelcontextprotocol/server-github"] +``` + +This creates a `mcp-github` toolset you can reference in `--toolsets` or platform configs. + +### Plugin toolsets + +Plugins can register their own toolsets via `ctx.register_tool()` during plugin initialization. These appear alongside built-in toolsets and can be enabled/disabled the same way. + +### Custom toolsets + +Define custom toolsets in `config.yaml` to create project-specific bundles: + +```yaml +toolsets: + - hermes-cli +custom_toolsets: + data-science: + - file + - terminal + - code_execution + - web + - vision +``` + +### Wildcards + +- `all` or `*` — expands to every registered toolset (built-in + dynamic + plugin) + +## Relationship to `hermes tools` + +The `hermes tools` command provides a curses-based UI for toggling individual tools on or off per platform. This operates at the tool level (finer than toolsets) and persists to `config.yaml`. Disabled tools are filtered out even if their toolset is enabled. + +See also: [Tools Reference](./tools-reference.md) for the complete list of individual tools and their parameters. diff --git a/website/docs/user-guide/features/context-references.md b/website/docs/user-guide/features/context-references.md index 18624150e..b43c3e3b1 100644 --- a/website/docs/user-guide/features/context-references.md +++ b/website/docs/user-guide/features/context-references.md @@ -95,6 +95,38 @@ All paths are resolved relative to the working directory. References that resolv Binary files are detected via MIME type and null-byte scanning. Known text extensions (`.py`, `.md`, `.json`, `.yaml`, `.toml`, `.js`, `.ts`, etc.) bypass MIME-based detection. Binary files are rejected with a warning. +## Platform Availability + +Context references are primarily a **CLI feature**. They work in the interactive CLI where `@` triggers tab completion and references are expanded before the message is sent to the agent. + +In **messaging platforms** (Telegram, Discord, etc.), the `@` syntax is not expanded by the gateway — messages are passed through as-is. The agent itself can still reference files via the `read_file`, `search_files`, and `web_extract` tools. + +## Interaction with Context Compression + +When conversation context is compressed, the expanded reference content is included in the compression summary. This means: + +- Large file contents injected via `@file:` contribute to context usage +- If the conversation is later compressed, the file content is summarized (not preserved verbatim) +- For very large files, consider using line ranges (`@file:main.py:100-200`) to inject only relevant sections + +## Common Patterns + +```text +# Code review workflow +Review @diff and check for security issues + +# Debug with context +This test is failing. Here's the test @file:tests/test_auth.py +and the implementation @file:src/auth.py:50-80 + +# Project exploration +What does this project do? @folder:src @file:README.md + +# Research +Compare the approaches in @url:https://arxiv.org/abs/2301.00001 +and @url:https://arxiv.org/abs/2301.00002 +``` + ## Error Handling Invalid references produce inline warnings rather than failures: diff --git a/website/docs/user-guide/features/cron.md b/website/docs/user-guide/features/cron.md index f8b1d2c5a..ff63848d8 100644 --- a/website/docs/user-guide/features/cron.md +++ b/website/docs/user-guide/features/cron.md @@ -187,9 +187,21 @@ When scheduling jobs, you specify where the output goes: | `"origin"` | Back to where the job was created | Default on messaging platforms | | `"local"` | Save to local files only (`~/.hermes/cron/output/`) | Default on CLI | | `"telegram"` | Telegram home channel | Uses `TELEGRAM_HOME_CHANNEL` | -| `"discord"` | Discord home channel | Uses `DISCORD_HOME_CHANNEL` | | `"telegram:123456"` | Specific Telegram chat by ID | Direct delivery | -| `"discord:987654"` | Specific Discord channel by ID | Direct delivery | +| `"telegram:-100123:17585"` | Specific Telegram topic | `chat_id:thread_id` format | +| `"discord"` | Discord home channel | Uses `DISCORD_HOME_CHANNEL` | +| `"discord:#engineering"` | Specific Discord channel | By channel name | +| `"slack"` | Slack home channel | | +| `"whatsapp"` | WhatsApp home | | +| `"signal"` | Signal | | +| `"matrix"` | Matrix home room | | +| `"mattermost"` | Mattermost home channel | | +| `"email"` | Email | | +| `"sms"` | SMS via Twilio | | +| `"homeassistant"` | Home Assistant | | +| `"dingtalk"` | DingTalk | | +| `"feishu"` | Feishu/Lark | | +| `"wecom"` | WeCom | | The agent's final response is automatically delivered. You do not need to call `send_message` in the cron prompt. diff --git a/website/docs/user-guide/features/honcho.md b/website/docs/user-guide/features/honcho.md index 55f78e43b..4d8c777c6 100644 --- a/website/docs/user-guide/features/honcho.md +++ b/website/docs/user-guide/features/honcho.md @@ -1,22 +1,39 @@ --- sidebar_position: 99 title: "Honcho Memory" -description: "Honcho is now available as a memory provider plugin" +description: "AI-native persistent memory via Honcho — dialectic reasoning, multi-agent user modeling, and deep personalization" --- # Honcho Memory -:::info Honcho is now a Memory Provider Plugin -Honcho has been integrated into the [Memory Providers](./memory-providers.md) system. All Honcho features are available through the unified memory provider interface. +[Honcho](https://github.com/plastic-labs/honcho) is an AI-native memory backend that adds dialectic reasoning and deep user modeling on top of Hermes's built-in memory system. Instead of simple key-value storage, Honcho maintains a running model of who the user is — their preferences, communication style, goals, and patterns — by reasoning about conversations after they happen. + +:::info Honcho is a Memory Provider Plugin +Honcho is integrated into the [Memory Providers](./memory-providers.md) system. All features below are available through the unified memory provider interface. ::: +## What Honcho Adds + +| Capability | Built-in Memory | Honcho | +|-----------|----------------|--------| +| Cross-session persistence | ✔ File-based MEMORY.md/USER.md | ✔ Server-side with API | +| User profile | ✔ Manual agent curation | ✔ Automatic dialectic reasoning | +| Multi-agent isolation | — | ✔ Per-peer profile separation | +| Observation modes | — | ✔ Unified or directional observation | +| Conclusions (derived insights) | — | ✔ Server-side reasoning about patterns | +| Search across history | ✔ FTS5 session search | ✔ Semantic search over conclusions | + +**Dialectic reasoning**: After each conversation, Honcho analyzes the exchange and derives "conclusions" — insights about the user's preferences, habits, and goals. These conclusions accumulate over time, giving the agent a deepening understanding that goes beyond what the user explicitly stated. + +**Multi-agent profiles**: When multiple Hermes instances talk to the same user (e.g., a coding assistant and a personal assistant), Honcho maintains separate "peer" profiles. Each peer sees only its own observations and conclusions, preventing cross-contamination of context. + ## Setup ```bash -hermes memory setup # select "honcho" +hermes memory setup # select "honcho" from the provider list ``` -Or set manually: +Or configure manually: ```yaml # ~/.hermes/config.yaml @@ -28,16 +45,49 @@ memory: echo "HONCHO_API_KEY=your-key" >> ~/.hermes/.env ``` +Get an API key at [honcho.dev](https://honcho.dev). + +## Configuration Options + +```yaml +# ~/.hermes/config.yaml +honcho: + observation: directional # "unified" (default for new installs) or "directional" + peer_name: "" # auto-detected from platform, or set manually +``` + +**Observation modes:** +- `unified` — All observations go into a single pool. Simpler, good for single-agent setups. +- `directional` — Observations are tagged with direction (user→agent, agent→user). Enables richer analysis of conversation dynamics. + +## Tools + +When Honcho is active as the memory provider, four additional tools become available: + +| Tool | Purpose | +|------|---------| +| `honcho_conclude` | Trigger server-side dialectic reasoning on recent conversations | +| `honcho_context` | Retrieve relevant context from Honcho's memory for the current conversation | +| `honcho_profile` | View or update the user's Honcho profile | +| `honcho_search` | Semantic search across all stored conclusions and observations | + +## CLI Commands + +```bash +hermes honcho status # Show connection status and config +hermes honcho peer # Update peer names for multi-agent setups +``` + ## Migrating from `hermes honcho` -If you previously used `hermes honcho setup`: +If you previously used the standalone `hermes honcho setup`: 1. Your existing configuration (`honcho.json` or `~/.honcho/config.json`) is preserved 2. Your server-side data (memories, conclusions, user profiles) is intact -3. Just set `memory.provider: honcho` to reactivate +3. Set `memory.provider: honcho` in config.yaml to reactivate No re-login or re-setup needed. Run `hermes memory setup` and select "honcho" — the wizard detects your existing config. ## Full Documentation -See [Memory Providers — Honcho](./memory-providers.md#honcho) for tools, config reference, and details. +See [Memory Providers — Honcho](./memory-providers.md#honcho) for the complete reference. diff --git a/website/docs/user-guide/features/image-generation.md b/website/docs/user-guide/features/image-generation.md index e6c3cd585..a782630b1 100644 --- a/website/docs/user-guide/features/image-generation.md +++ b/website/docs/user-guide/features/image-generation.md @@ -141,10 +141,25 @@ Debug logs are saved to `./logs/image_tools_debug_.json` with detail The image generation tool runs with safety checks disabled by default (`safety_tolerance: 5`, the most permissive setting). This is configured at the code level and is not user-adjustable. +## Platform Delivery + +Generated images are delivered differently depending on the platform: + +| Platform | Delivery method | +|----------|----------------| +| **CLI** | Image URL printed as markdown `![description](url)` — click to open in browser | +| **Telegram** | Image sent as a photo message with the prompt as caption | +| **Discord** | Image embedded in a message | +| **Slack** | Image URL in message (Slack unfurls it) | +| **WhatsApp** | Image sent as a media message | +| **Other platforms** | Image URL in plain text | + +The agent uses `MEDIA:` syntax in its response, which the platform adapter converts to the appropriate format. + ## Limitations - **Requires FAL API key** — image generation incurs API costs on your FAL.ai account - **No image editing** — this is text-to-image only, no inpainting or img2img -- **URL-based delivery** — images are returned as temporary FAL.ai URLs, not saved locally +- **URL-based delivery** — images are returned as temporary FAL.ai URLs, not saved locally. URLs expire after a period (typically hours) - **Upscaling adds latency** — the automatic 2x upscale step adds processing time - **Max 4 images per request** — `num_images` is capped at 4 diff --git a/website/docs/user-guide/features/overview.md b/website/docs/user-guide/features/overview.md index 568797dfc..9d9c7b2c5 100644 --- a/website/docs/user-guide/features/overview.md +++ b/website/docs/user-guide/features/overview.md @@ -31,15 +31,17 @@ Hermes Agent includes a rich set of capabilities that extend far beyond basic ch - **[Browser Automation](browser.md)** — Full browser automation with multiple backends: Browserbase cloud, Browser Use cloud, local Chrome via CDP, or local Chromium. Navigate websites, fill forms, and extract information. - **[Vision & Image Paste](vision.md)** — Multimodal vision support. Paste images from your clipboard into the CLI and ask the agent to analyze, describe, or work with them using any vision-capable model. - **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai's FLUX 2 Pro model with automatic 2x upscaling via the Clarity Upscaler. -- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with four provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, and NeuTTS. +- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with five provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, MiniMax, and NeuTTS. ## Integrations +- **[MCP Integration](mcp.md)** — Connect to any MCP server via stdio or HTTP transport. Access external tools from GitHub, databases, file systems, and internal APIs without writing native Hermes tools. Includes per-server tool filtering and sampling support. - **[Provider Routing](provider-routing.md)** — Fine-grained control over which AI providers handle your requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and priority ordering. - **[Fallback Providers](fallback-providers.md)** — Automatic failover to backup LLM providers when your primary model encounters errors, including independent fallback for auxiliary tasks like vision and compression. +- **[Credential Pools](credential-pools.md)** — Distribute API calls across multiple keys for the same provider. Automatic rotation on rate limits or failures. +- **[Memory Providers](memory-providers.md)** — Plug in external memory backends (Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover) for cross-session user modeling and personalization beyond the built-in memory system. - **[API Server](api-server.md)** — Expose Hermes as an OpenAI-compatible HTTP endpoint. Connect any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, and more. - **[IDE Integration (ACP)](acp.md)** — Use Hermes inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Chat, tool activity, file diffs, and terminal commands render inside your editor. -- **[Honcho Memory](honcho.md)** — AI-native persistent memory for cross-session user modeling and personalization via dialectic reasoning. - **[RL Training](rl-training.md)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning. ## Customization diff --git a/website/docs/user-guide/messaging/webhooks.md b/website/docs/user-guide/messaging/webhooks.md index b804152f2..d13210a45 100644 --- a/website/docs/user-guide/messaging/webhooks.md +++ b/website/docs/user-guide/messaging/webhooks.md @@ -70,7 +70,7 @@ Routes define how different webhook sources are handled. Each route is a named e | `secret` | **Yes** | HMAC secret for signature validation. Falls back to the global `secret` if not set on the route. Set to `"INSECURE_NO_AUTH"` for testing only (skips validation). | | `prompt` | No | Template string with dot-notation payload access (e.g. `{pull_request.title}`). If omitted, the full JSON payload is dumped into the prompt. | | `skills` | No | List of skill names to load for the agent run. | -| `deliver` | No | Where to send the response: `github_comment`, `telegram`, `discord`, `slack`, `signal`, `sms`, or `log` (default). | +| `deliver` | No | Where to send the response: `github_comment`, `telegram`, `discord`, `slack`, `signal`, `matrix`, `mattermost`, `email`, `sms`, `dingtalk`, `feishu`, `wecom`, or `log` (default). | | `deliver_extra` | No | Additional delivery config — keys depend on `deliver` type (e.g. `repo`, `pr_number`, `chat_id`). Values support the same `{dot.notation}` templates as `prompt`. | ### Full example diff --git a/website/docs/user-guide/sessions.md b/website/docs/user-guide/sessions.md index 736ac8a30..a84e1064d 100644 --- a/website/docs/user-guide/sessions.md +++ b/website/docs/user-guide/sessions.md @@ -10,7 +10,7 @@ Hermes Agent automatically saves every conversation as a session. Sessions enabl ## How Sessions Work -Every conversation — whether from the CLI, Telegram, Discord, WhatsApp, or Slack — is stored as a session with full message history. Sessions are tracked in two complementary systems: +Every conversation — whether from the CLI, Telegram, Discord, Slack, WhatsApp, Signal, Matrix, or any other messaging platform — is stored as a session with full message history. Sessions are tracked in two complementary systems: 1. **SQLite database** (`~/.hermes/state.db`) — structured session metadata with FTS5 full-text search 2. **JSONL transcripts** (`~/.hermes/sessions/`) — raw conversation transcripts including tool calls (gateway) @@ -34,8 +34,22 @@ Each session is tagged with its source platform: | `cli` | Interactive CLI (`hermes` or `hermes chat`) | | `telegram` | Telegram messenger | | `discord` | Discord server/DM | -| `whatsapp` | WhatsApp messenger | | `slack` | Slack workspace | +| `whatsapp` | WhatsApp messenger | +| `signal` | Signal messenger | +| `matrix` | Matrix rooms and DMs | +| `mattermost` | Mattermost channels | +| `email` | Email (IMAP/SMTP) | +| `sms` | SMS via Twilio | +| `dingtalk` | DingTalk messenger | +| `feishu` | Feishu/Lark messenger | +| `wecom` | WeCom (WeChat Work) | +| `homeassistant` | Home Assistant conversation | +| `webhook` | Incoming webhooks | +| `api-server` | API server requests | +| `acp` | ACP editor integration | +| `cron` | Scheduled cron jobs | +| `batch` | Batch processing runs | ## CLI Session Resume