docs: comprehensive documentation audit — fix stale info, expand thin pages, add depth (#5393)

Major changes across 20 documentation pages:

Staleness fixes:
- Fix FAQ: wrong import path (hermes.agent → run_agent)
- Fix FAQ: stale Gemini 2.0 model → Gemini 3 Flash
- Fix integrations/index: missing MiniMax TTS provider
- Fix integrations/index: web_crawl is not a registered tool
- Fix sessions: add all 19 session sources (was only 5)
- Fix cron: add all 18 delivery targets (was only telegram/discord)
- Fix webhooks: add all delivery targets
- Fix overview: add missing MCP, memory providers, credential pools
- Fix all line-number references → use function name searches instead
- Update file size estimates (run_agent ~9200, gateway ~7200, cli ~8500)

Expanded thin pages (< 150 lines → substantial depth):
- honcho.md: 43 → 108 lines — added feature comparison, tools, config, CLI
- overview.md: 49 → 55 lines — added MCP, memory providers, credential pools
- toolsets-reference.md: 57 → 175 lines — added explanations, config examples,
  custom toolsets, wildcards, platform differences table
- optional-skills-catalog.md: 74 → 153 lines — added 25+ missing skills across
  communication, devops, mlops (18!), productivity, research categories
- integrations/index.md: 82 → 115 lines — added messaging, HA, plugins sections
- cron-internals.md: 90 → 195 lines — added job JSON example, lifecycle states,
  tick cycle, delivery targets, script-backed jobs, CLI interface
- gateway-internals.md: 111 → 250 lines — added architecture diagram, message
  flow, two-level guard, platform adapters, token locks, process management
- agent-loop.md: 112 → 235 lines — added entry points, API mode resolution,
  turn lifecycle detail, message alternation rules, tool execution flow,
  callback table, budget tracking, compression details
- architecture.md: 152 → 295 lines — added system overview diagram, data flow
  diagrams, design principles table, dependency chain

Other depth additions:
- context-references.md: added platform availability, compression interaction,
  common patterns sections
- slash-commands.md: added quick commands config example, alias resolution
- image-generation.md: added platform delivery table
- tools-reference.md: added tool counts, MCP tools note
- index.md: updated platform count (5 → 14+), tool count (40+ → 47)
This commit is contained in:
Teknium 2026-04-05 19:45:50 -07:00 committed by GitHub
parent fec58ad99e
commit 43d468cea8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
20 changed files with 1243 additions and 406 deletions

View file

@ -6,107 +6,231 @@ description: "Detailed walkthrough of AIAgent execution, API modes, tools, callb
# Agent Loop Internals
The core orchestration engine is `run_agent.py`'s `AIAgent`.
The core orchestration engine is `run_agent.py`'s `AIAgent` class — roughly 9,200 lines that handle everything from prompt assembly to tool dispatch to provider failover.
## Core responsibilities
## Core Responsibilities
`AIAgent` is responsible for:
- assembling the effective prompt and tool schemas
- selecting the correct provider/API mode
- making interruptible model calls
- executing tool calls (sequentially or concurrently)
- maintaining session history
- handling compression, retries, and fallback models
- Assembling the effective system prompt and tool schemas via `prompt_builder.py`
- Selecting the correct provider/API mode (chat_completions, codex_responses, anthropic_messages)
- Making interruptible model calls with cancellation support
- Executing tool calls (sequentially or concurrently via thread pool)
- Maintaining conversation history in OpenAI message format
- Handling compression, retries, and fallback model switching
- Tracking iteration budgets across parent and child agents
- Flushing persistent memory before context is lost
## API modes
## Two Entry Points
Hermes currently supports three API execution modes:
```python
# Simple interface — returns final response string
response = agent.chat("Fix the bug in main.py")
| API mode | Used for |
|----------|----------|
| `chat_completions` | OpenAI-compatible chat endpoints, including OpenRouter and most custom endpoints |
| `codex_responses` | OpenAI Codex / Responses API path |
| `anthropic_messages` | Native Anthropic Messages API |
# Full interface — returns dict with messages, metadata, usage stats
result = agent.run_conversation(
user_message="Fix the bug in main.py",
system_message=None, # auto-built if omitted
conversation_history=None, # auto-loaded from session if omitted
task_id="task_abc123"
)
```
The mode is resolved from explicit args, provider selection, and base URL heuristics.
`chat()` is a thin wrapper around `run_conversation()` that extracts the `final_response` field from the result dict.
## Turn lifecycle
## API Modes
Hermes supports three API execution modes, resolved from provider selection, explicit args, and base URL heuristics:
| API mode | Used for | Client type |
|----------|----------|-------------|
| `chat_completions` | OpenAI-compatible endpoints (OpenRouter, custom, most providers) | `openai.OpenAI` |
| `codex_responses` | OpenAI Codex / Responses API | `openai.OpenAI` with Responses format |
| `anthropic_messages` | Native Anthropic Messages API | `anthropic.Anthropic` via adapter |
The mode determines how messages are formatted, how tool calls are structured, how responses are parsed, and how caching/streaming works. All three converge on the same internal message format (OpenAI-style `role`/`content`/`tool_calls` dicts) before and after API calls.
**Mode resolution order:**
1. Explicit `api_mode` constructor arg (highest priority)
2. Provider-specific detection (e.g., `anthropic` provider → `anthropic_messages`)
3. Base URL heuristics (e.g., `api.anthropic.com``anthropic_messages`)
4. Default: `chat_completions`
## Turn Lifecycle
Each iteration of the agent loop follows this sequence:
```text
run_conversation()
-> generate effective task_id
-> append current user message
-> load or build cached system prompt
-> maybe preflight-compress
-> build api_messages
-> inject ephemeral prompt layers
-> apply prompt caching if appropriate
-> make interruptible API call
-> if tool calls: execute them, append tool results, loop
-> if final text: persist, cleanup, return response
1. Generate task_id if not provided
2. Append user message to conversation history
3. Build or reuse cached system prompt (prompt_builder.py)
4. Check if preflight compression is needed (>50% context)
5. Build API messages from conversation history
- chat_completions: OpenAI format as-is
- codex_responses: convert to Responses API input items
- anthropic_messages: convert via anthropic_adapter.py
6. Inject ephemeral prompt layers (budget warnings, context pressure)
7. Apply prompt caching markers if on Anthropic
8. Make interruptible API call (_api_call_with_interrupt)
9. Parse response:
- If tool_calls: execute them, append results, loop back to step 5
- If text response: persist session, flush memory if needed, return
```
## Interruptible API calls
### Message Format
Hermes wraps API requests so they can be interrupted from the CLI or gateway.
All messages use OpenAI-compatible format internally:
This matters because:
```python
{"role": "system", "content": "..."}
{"role": "user", "content": "..."}
{"role": "assistant", "content": "...", "tool_calls": [...]}
{"role": "tool", "tool_call_id": "...", "content": "..."}
```
- the agent may be in a long LLM call
- the user may send a new message mid-flight
- background systems may need cancellation semantics
Reasoning content (from models that support extended thinking) is stored in `assistant_msg["reasoning"]` and optionally displayed via the `reasoning_callback`.
## Tool execution modes
### Message Alternation Rules
Hermes uses two execution strategies:
The agent loop enforces strict message role alternation:
- sequential execution for single or interactive tools
- concurrent execution for multiple non-interactive tools
- After the system message: `User → Assistant → User → Assistant → ...`
- During tool calling: `Assistant (with tool_calls) → Tool → Tool → ... → Assistant`
- **Never** two assistant messages in a row
- **Never** two user messages in a row
- **Only** `tool` role can have consecutive entries (parallel tool results)
Concurrent tool execution preserves message/result ordering when reinserting tool responses into conversation history.
Providers validate these sequences and will reject malformed histories.
## Callback surfaces
## Interruptible API Calls
`AIAgent` supports platform/integration callbacks such as:
API requests are wrapped in `_api_call_with_interrupt()` which runs the actual HTTP call in a background thread while monitoring an interrupt event:
- `tool_progress_callback`
- `thinking_callback`
- `reasoning_callback`
- `clarify_callback`
- `step_callback`
- `stream_delta_callback`
- `tool_gen_callback`
- `status_callback`
```text
┌──────────────────────┐ ┌──────────────┐
│ Main thread │ │ API thread │
│ wait on: │────▶│ HTTP POST │
│ - response ready │ │ to provider │
│ - interrupt event │ └──────────────┘
│ - timeout │
└──────────────────────┘
```
These are how the CLI, gateway, and ACP integrations stream intermediate progress and interactive approval/clarification flows.
When interrupted (user sends new message, `/stop` command, or signal):
- The API thread is abandoned (response discarded)
- The agent can process the new input or shut down cleanly
- No partial response is injected into conversation history
## Budget and fallback behavior
## Tool Execution
Hermes tracks a shared iteration budget across parent and subagents. It also injects budget pressure hints near the end of the available iteration window.
### Sequential vs Concurrent
Fallback model support allows the agent to switch providers/models when the primary route fails in supported failure paths.
When the model returns tool calls:
## Compression and persistence
- **Single tool call** → executed directly in the main thread
- **Multiple tool calls** → executed concurrently via `ThreadPoolExecutor`
- Exception: tools marked as interactive (e.g., `clarify`) force sequential execution
- Results are reinserted in the original tool call order regardless of completion order
Before and during long runs, Hermes may:
### Execution Flow
- flush memory before context loss
- compress middle conversation turns
- split the session lineage into a new session ID after compression
- preserve recent context and structural tool-call/result consistency
```text
for each tool_call in response.tool_calls:
1. Resolve handler from tools/registry.py
2. Fire pre_tool_call plugin hook
3. Check if dangerous command (tools/approval.py)
- If dangerous: invoke approval_callback, wait for user
4. Execute handler with args + task_id
5. Fire post_tool_call plugin hook
6. Append {"role": "tool", "content": result} to history
```
## Key files to read next
### Agent-Level Tools
- `run_agent.py`
- `agent/prompt_builder.py`
- `agent/context_compressor.py`
- `agent/prompt_caching.py`
- `model_tools.py`
Some tools are intercepted by `run_agent.py` *before* reaching `handle_function_call()`:
## Related docs
| Tool | Why intercepted |
|------|-----------------|
| `todo` | Reads/writes agent-local task state |
| `memory` | Writes to persistent memory files with character limits |
These tools modify agent state directly and return synthetic tool results without going through the registry.
## Callback Surfaces
`AIAgent` supports platform-specific callbacks that enable real-time progress in the CLI, gateway, and ACP integrations:
| Callback | When fired | Used by |
|----------|-----------|---------|
| `tool_progress_callback` | Before/after each tool execution | CLI spinner, gateway progress messages |
| `thinking_callback` | When model starts/stops thinking | CLI "thinking..." indicator |
| `reasoning_callback` | When model returns reasoning content | CLI reasoning display, gateway reasoning blocks |
| `clarify_callback` | When `clarify` tool is called | CLI input prompt, gateway interactive message |
| `step_callback` | After each complete agent turn | Gateway step tracking, ACP progress |
| `stream_delta_callback` | Each streaming token (when enabled) | CLI streaming display |
| `tool_gen_callback` | When tool call is parsed from stream | CLI tool preview in spinner |
| `status_callback` | State changes (thinking, executing, etc.) | ACP status updates |
## Budget and Fallback Behavior
### Iteration Budget
The agent tracks iterations via `IterationBudget`:
- Default: 90 iterations (configurable via `agent.max_turns`)
- Shared across parent and child agents — a subagent consumes from the parent's budget
- At 70%+ usage, `_get_budget_warning()` appends a `[BUDGET WARNING: ...]` to the last tool result
- At 100%, the agent stops and returns a summary of work done
### Fallback Model
When the primary model fails (429 rate limit, 5xx server error, 401/403 auth error):
1. Check `fallback_providers` list in config
2. Try each fallback in order
3. On success, continue the conversation with the new provider
4. On 401/403, attempt credential refresh before failing over
The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the `auxiliary.*` config section.
## Compression and Persistence
### When Compression Triggers
- **Preflight** (before API call): If conversation exceeds 50% of model's context window
- **Gateway auto-compression**: If conversation exceeds 85% (more aggressive, runs between turns)
### What Happens During Compression
1. Memory is flushed to disk first (preventing data loss)
2. Middle conversation turns are summarized into a compact summary
3. The last N messages are preserved intact (`compression.protect_last_n`, default: 20)
4. Tool call/result message pairs are kept together (never split)
5. A new session lineage ID is generated (compression creates a "child" session)
### Session Persistence
After each turn:
- Messages are saved to the session store (SQLite via `hermes_state.py`)
- Memory changes are flushed to `MEMORY.md` / `USER.md`
- The session can be resumed later via `/resume` or `hermes chat --resume`
## Key Source Files
| File | Purpose |
|------|---------|
| `run_agent.py` | AIAgent class — the complete agent loop (~9,200 lines) |
| `agent/prompt_builder.py` | System prompt assembly from memory, skills, context files, personality |
| `agent/context_compressor.py` | Conversation compression algorithm |
| `agent/prompt_caching.py` | Anthropic prompt caching markers and cache metrics |
| `agent/auxiliary_client.py` | Auxiliary LLM client for side tasks (vision, summarization) |
| `model_tools.py` | Tool schema collection, `handle_function_call()` dispatch |
## Related Docs
- [Provider Runtime Resolution](./provider-runtime.md)
- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
- [Tools Runtime](./tools-runtime.md)
- [Architecture Overview](./architecture.md)

View file

@ -1,152 +1,274 @@
---
sidebar_position: 1
title: "Architecture"
description: "Hermes Agent internals — major subsystems, execution paths, and where to read next"
description: "Hermes Agent internals — major subsystems, execution paths, data flow, and where to read next"
---
# Architecture
This page is the top-level map of Hermes Agent internals. The project has grown beyond a single monolithic loop, so the best way to understand it is by subsystem.
This page is the top-level map of Hermes Agent internals. Use it to orient yourself in the codebase, then dive into subsystem-specific docs for implementation details.
## High-level structure
## System Overview
```text
┌─────────────────────────────────────────────────────────────────────┐
│ Entry Points │
│ │
│ CLI (cli.py) Gateway (gateway/run.py) ACP (acp_adapter/) │
│ Batch Runner API Server Python Library │
└──────────┬──────────────┬───────────────────────┬────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ AIAgent (run_agent.py) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Prompt │ │ Provider │ │ Tool │ │
│ │ Builder │ │ Resolution │ │ Dispatch │ │
│ │ (prompt_ │ │ (runtime_ │ │ (model_ │ │
│ │ builder.py) │ │ provider.py)│ │ tools.py) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ┌──────┴───────┐ ┌──────┴───────┐ ┌──────┴───────┐ │
│ │ Compression │ │ 3 API Modes │ │ Tool Registry│ │
│ │ & Caching │ │ chat_compl. │ │ (registry.py)│ │
│ │ │ │ codex_resp. │ │ 47 tools │ │
│ │ │ │ anthropic │ │ 37 toolsets │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌───────────────────┐ ┌──────────────────────┐
│ Session Storage │ │ Tool Backends │
│ (SQLite + FTS5) │ │ Terminal (6 backends) │
│ hermes_state.py │ │ Browser (5 backends) │
│ gateway/session.py│ │ Web (4 backends) │
└───────────────────┘ │ MCP (dynamic) │
│ File, Vision, etc. │
└──────────────────────┘
```
## Directory Structure
```text
hermes-agent/
├── run_agent.py # AIAgent core loop
├── cli.py # interactive terminal UI
├── model_tools.py # tool discovery/orchestration
├── toolsets.py # tool groupings and presets
├── hermes_state.py # SQLite session/state database
├── batch_runner.py # batch trajectory generation
├── run_agent.py # AIAgent — core conversation loop (~9,200 lines)
├── cli.py # HermesCLI — interactive terminal UI (~8,500 lines)
├── model_tools.py # Tool discovery, schema collection, dispatch
├── toolsets.py # Tool groupings and platform presets
├── hermes_state.py # SQLite session/state database with FTS5
├── hermes_constants.py # HERMES_HOME, profile-aware paths
├── batch_runner.py # Batch trajectory generation
├── agent/ # prompt building, compression, caching, metadata, trajectories
├── hermes_cli/ # command entrypoints, auth, setup, models, config, doctor
├── tools/ # tool implementations and terminal environments
├── gateway/ # messaging gateway, session routing, delivery, pairing, hooks
├── cron/ # scheduled job storage and scheduler
├── plugins/memory/ # Memory provider plugins (honcho, openviking, mem0, etc.)
├── acp_adapter/ # ACP editor integration server
├── acp_registry/ # ACP registry manifest + icon
├── environments/ # Hermes RL / benchmark environment framework
├── skills/ # bundled skills
├── optional-skills/ # official optional skills
└── tests/ # test suite
├── agent/ # Agent internals
│ ├── prompt_builder.py # System prompt assembly
│ ├── context_compressor.py # Conversation compression algorithm
│ ├── prompt_caching.py # Anthropic prompt caching
│ ├── auxiliary_client.py # Auxiliary LLM for side tasks (vision, summarization)
│ ├── model_metadata.py # Model context lengths, token estimation
│ ├── models_dev.py # models.dev registry integration
│ ├── anthropic_adapter.py # Anthropic Messages API format conversion
│ ├── display.py # KawaiiSpinner, tool preview formatting
│ ├── skill_commands.py # Skill slash commands
│ ├── memory_store.py # Persistent memory read/write
│ └── trajectory.py # Trajectory saving helpers
├── hermes_cli/ # CLI subcommands and setup
│ ├── main.py # Entry point — all `hermes` subcommands (~4,200 lines)
│ ├── config.py # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
│ ├── commands.py # COMMAND_REGISTRY — central slash command definitions
│ ├── auth.py # PROVIDER_REGISTRY, credential resolution
│ ├── runtime_provider.py # Provider → api_mode + credentials
│ ├── models.py # Model catalog, provider model lists
│ ├── model_switch.py # /model command logic (CLI + gateway shared)
│ ├── setup.py # Interactive setup wizard (~3,500 lines)
│ ├── skin_engine.py # CLI theming engine
│ ├── skills_config.py # hermes skills — enable/disable per platform
│ ├── skills_hub.py # /skills slash command
│ ├── tools_config.py # hermes tools — enable/disable per platform
│ ├── plugins.py # PluginManager — discovery, loading, hooks
│ ├── callbacks.py # Terminal callbacks (clarify, sudo, approval)
│ └── gateway.py # hermes gateway start/stop
├── tools/ # Tool implementations (one file per tool)
│ ├── registry.py # Central tool registry
│ ├── approval.py # Dangerous command detection
│ ├── terminal_tool.py # Terminal orchestration
│ ├── process_registry.py # Background process management
│ ├── file_tools.py # read_file, write_file, patch, search_files
│ ├── web_tools.py # web_search, web_extract
│ ├── browser_tool.py # 11 browser automation tools
│ ├── code_execution_tool.py # execute_code sandbox
│ ├── delegate_tool.py # Subagent delegation
│ ├── mcp_tool.py # MCP client (~1,050 lines)
│ ├── credential_files.py # File-based credential passthrough
│ ├── env_passthrough.py # Env var passthrough for sandboxes
│ ├── ansi_strip.py # ANSI escape stripping
│ └── environments/ # Terminal backends (local, docker, ssh, modal, daytona, singularity)
├── gateway/ # Messaging platform gateway
│ ├── run.py # GatewayRunner — message dispatch (~5,800 lines)
│ ├── session.py # SessionStore — conversation persistence
│ ├── delivery.py # Outbound message delivery
│ ├── pairing.py # DM pairing authorization
│ ├── hooks.py # Hook discovery and lifecycle events
│ ├── mirror.py # Cross-session message mirroring
│ ├── status.py # Token locks, profile-scoped process tracking
│ ├── builtin_hooks/ # Always-registered hooks
│ └── platforms/ # 14 adapters: telegram, discord, slack, whatsapp,
│ # signal, matrix, mattermost, email, sms,
│ # dingtalk, feishu, wecom, homeassistant, webhook
├── acp_adapter/ # ACP server (VS Code / Zed / JetBrains)
├── cron/ # Scheduler (jobs.py, scheduler.py)
├── plugins/memory/ # Memory provider plugins
├── environments/ # RL training environments (Atropos)
├── skills/ # Bundled skills (always available)
├── optional-skills/ # Official optional skills (install explicitly)
├── website/ # Docusaurus documentation site
└── tests/ # Pytest suite (~3,000+ tests)
```
## Recommended reading order
## Data Flow
If you are new to the codebase, read in this order:
### CLI Session
1. this page
2. [Agent Loop Internals](./agent-loop.md)
3. [Prompt Assembly](./prompt-assembly.md)
4. [Provider Runtime Resolution](./provider-runtime.md)
5. [Adding Providers](./adding-providers.md)
6. [Tools Runtime](./tools-runtime.md)
7. [Session Storage](./session-storage.md)
8. [Gateway Internals](./gateway-internals.md)
9. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
10. [ACP Internals](./acp-internals.md)
11. [Environments, Benchmarks & Data Generation](./environments.md)
```text
User input → HermesCLI.process_input()
→ AIAgent.run_conversation()
→ prompt_builder.build_system_prompt()
→ runtime_provider.resolve_runtime_provider()
→ API call (chat_completions / codex_responses / anthropic_messages)
→ tool_calls? → model_tools.handle_function_call() → loop
→ final response → display → save to SessionDB
```
## Major subsystems
### Gateway Message
### Agent loop
```text
Platform event → Adapter.on_message() → MessageEvent
→ GatewayRunner._handle_message()
→ authorize user
→ resolve session key
→ create AIAgent with session history
→ AIAgent.run_conversation()
→ deliver response back through adapter
```
The core synchronous orchestration engine is `AIAgent` in `run_agent.py`.
### Cron Job
It is responsible for:
```text
Scheduler tick → load due jobs from jobs.json
→ create fresh AIAgent (no history)
→ inject attached skills as context
→ run job prompt
→ deliver response to target platform
→ update job state and next_run
```
- provider/API-mode selection
- prompt construction
- tool execution
- retries and fallback
- callbacks
- compression and persistence
## Recommended Reading Order
See [Agent Loop Internals](./agent-loop.md).
If you are new to the codebase:
### Prompt system
1. **This page** — orient yourself
2. **[Agent Loop Internals](./agent-loop.md)** — how AIAgent works
3. **[Prompt Assembly](./prompt-assembly.md)** — system prompt construction
4. **[Provider Runtime Resolution](./provider-runtime.md)** — how providers are selected
5. **[Adding Providers](./adding-providers.md)** — practical guide to adding a new provider
6. **[Tools Runtime](./tools-runtime.md)** — tool registry, dispatch, environments
7. **[Session Storage](./session-storage.md)** — SQLite schema, FTS5, session lineage
8. **[Gateway Internals](./gateway-internals.md)** — messaging platform gateway
9. **[Context Compression & Prompt Caching](./context-compression-and-caching.md)** — compression and caching
10. **[ACP Internals](./acp-internals.md)** — IDE integration
11. **[Environments, Benchmarks & Data Generation](./environments.md)** — RL training
Prompt-building logic is split between:
## Major Subsystems
- `run_agent.py`
- `agent/prompt_builder.py`
- `agent/prompt_caching.py`
- `agent/context_compressor.py`
### Agent Loop
See:
The synchronous orchestration engine (`AIAgent` in `run_agent.py`). Handles provider selection, prompt construction, tool execution, retries, fallback, callbacks, compression, and persistence. Supports three API modes for different provider backends.
- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
→ [Agent Loop Internals](./agent-loop.md)
### Provider/runtime resolution
### Prompt System
Hermes has a shared runtime provider resolver used by CLI, gateway, cron, ACP, and auxiliary calls.
Prompt construction and maintenance across the conversation lifecycle:
See [Provider Runtime Resolution](./provider-runtime.md).
- **`prompt_builder.py`** — Assembles the system prompt from: personality (SOUL.md), memory (MEMORY.md, USER.md), skills, context files (AGENTS.md, .hermes.md), tool-use guidance, and model-specific instructions
- **`prompt_caching.py`** — Applies Anthropic cache breakpoints for prefix caching
- **`context_compressor.py`** — Summarizes middle conversation turns when context exceeds thresholds
### Tooling runtime
→ [Prompt Assembly](./prompt-assembly.md), [Context Compression & Prompt Caching](./context-compression-and-caching.md)
The tool registry, toolsets, terminal backends, process manager, and dispatch rules form a subsystem of their own.
### Provider Resolution
See [Tools Runtime](./tools-runtime.md).
A shared runtime resolver used by CLI, gateway, cron, ACP, and auxiliary calls. Maps `(provider, model)` tuples to `(api_mode, api_key, base_url)`. Handles 18+ providers, OAuth flows, credential pools, and alias resolution.
### Session persistence
→ [Provider Runtime Resolution](./provider-runtime.md)
Historical session state is stored primarily in SQLite, with lineage preserved across compression splits.
### Tool System
See [Session Storage](./session-storage.md).
Central tool registry (`tools/registry.py`) with 47 registered tools across 20 toolsets. Each tool file self-registers at import time. The registry handles schema collection, dispatch, availability checking, and error wrapping. Terminal tools support 6 backends (local, Docker, SSH, Daytona, Modal, Singularity).
### Messaging gateway
→ [Tools Runtime](./tools-runtime.md)
The gateway is a long-running orchestration layer for platform adapters, session routing, pairing, delivery, and cron ticking.
### Session Persistence
See [Gateway Internals](./gateway-internals.md).
SQLite-based session storage with FTS5 full-text search. Sessions have lineage tracking (parent/child across compressions), per-platform isolation, and atomic writes with contention handling.
### ACP integration
→ [Session Storage](./session-storage.md)
ACP exposes Hermes as an editor-native agent over stdio/JSON-RPC.
### Messaging Gateway
See:
Long-running process with 14 platform adapters, unified session routing, user authorization (allowlists + DM pairing), slash command dispatch, hook system, cron ticking, and background maintenance.
- [ACP Editor Integration](../user-guide/features/acp.md)
- [ACP Internals](./acp-internals.md)
→ [Gateway Internals](./gateway-internals.md)
### Plugin System
Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Memory providers are a specialized plugin type under `plugins/memory/`.
→ [Plugin Guide](/docs/guides/build-a-hermes-plugin), [Memory Provider Plugin](./memory-provider-plugin.md)
### Cron
Cron jobs are implemented as first-class agent tasks, not just shell tasks.
First-class agent tasks (not shell tasks). Jobs store in JSON, support multiple schedule formats, can attach skills and scripts, and deliver to any platform.
See [Cron Internals](./cron-internals.md).
[Cron Internals](./cron-internals.md)
### RL / environments / trajectories
### ACP Integration
Hermes ships a full environment framework for evaluation, RL integration, and SFT data generation.
Exposes Hermes as an editor-native agent over stdio/JSON-RPC for VS Code, Zed, and JetBrains.
See:
→ [ACP Internals](./acp-internals.md)
- [Environments, Benchmarks & Data Generation](./environments.md)
- [Trajectories & Training Format](./trajectory-format.md)
### RL / Environments / Trajectories
## Design themes
Full environment framework for evaluation and RL training. Integrates with Atropos, supports multiple tool-call parsers, and generates ShareGPT-format trajectories.
Several cross-cutting design themes appear throughout the codebase:
→ [Environments, Benchmarks & Data Generation](./environments.md), [Trajectories & Training Format](./trajectory-format.md)
- prompt stability matters
- tool execution must be observable and interruptible
- session persistence must survive long-running use
- platform frontends should share one agent core
- optional subsystems should remain loosely coupled where possible
## Design Principles
## Implementation notes
| Principle | What it means in practice |
|-----------|--------------------------|
| **Prompt stability** | System prompt doesn't change mid-conversation. No cache-breaking mutations except explicit user actions (`/model`). |
| **Observable execution** | Every tool call is visible to the user via callbacks. Progress updates in CLI (spinner) and gateway (chat messages). |
| **Interruptible** | API calls and tool execution can be cancelled mid-flight by user input or signals. |
| **Platform-agnostic core** | One AIAgent class serves CLI, gateway, ACP, batch, and API server. Platform differences live in the entry point, not the agent. |
| **Loose coupling** | Optional subsystems (MCP, plugins, memory providers, RL environments) use registry patterns and check_fn gating, not hard dependencies. |
| **Profile isolation** | Each profile (`hermes -p <name>`) gets its own HERMES_HOME, config, memory, sessions, and gateway PID. Multiple profiles run concurrently. |
The older mental model of Hermes as “one OpenAI-compatible chat loop plus some tools” is no longer sufficient. Current Hermes includes:
## File Dependency Chain
- multiple API modes
- auxiliary model routing
- ACP editor integration
- gateway-specific session and delivery semantics
- RL environment infrastructure
- prompt-caching and compression logic with lineage-aware persistence
```text
tools/registry.py (no deps — imported by all tool files)
tools/*.py (each calls registry.register() at import time)
model_tools.py (imports tools/registry + triggers tool discovery)
run_agent.py, cli.py, batch_runner.py, environments/
```
Use this page as the map, then dive into subsystem-specific docs for the real implementation details.
This chain means tool registration happens at import time, before any agent instance is created. Adding a new tool requires an import in `model_tools.py`'s `_discover_tools()` list.

View file

@ -4,7 +4,7 @@ Hermes Agent uses a dual compression system and Anthropic prompt caching to
manage context window usage efficiently across long conversations.
Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`,
`gateway/run.py` (session hygiene), `run_agent.py` (lines 1146-1204)
`gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`)
## Dual Compression System
@ -26,7 +26,7 @@ Hermes has two separate compression layers that operate independently:
### 1. Gateway Session Hygiene (85% threshold)
Located in `gateway/run.py` (around line 2220). This is a **safety net** that
Located in `gateway/run.py` (search for `_maybe_compress_session`). This is a **safety net** that
runs before the agent processes a message. It prevents API failures when sessions
grow too large between turns (e.g., overnight accumulation in Telegram/Discord).

View file

@ -6,85 +6,195 @@ description: "How Hermes stores, schedules, edits, pauses, skill-loads, and deli
# Cron Internals
Hermes cron support is implemented primarily in:
The cron subsystem provides scheduled task execution — from simple one-shot delays to recurring cron-expression jobs with skill injection and cross-platform delivery.
- `cron/jobs.py`
- `cron/scheduler.py`
- `tools/cronjob_tools.py`
- `gateway/run.py`
- `hermes_cli/cron.py`
## Key Files
## Scheduling model
| File | Purpose |
|------|---------|
| `cron/jobs.py` | Job model, storage, atomic read/write to `jobs.json` |
| `cron/scheduler.py` | Scheduler loop — due-job detection, execution, repeat tracking |
| `tools/cronjob_tools.py` | Model-facing `cronjob` tool registration and handler |
| `gateway/run.py` | Gateway integration — cron ticking in the long-running loop |
| `hermes_cli/cron.py` | CLI `hermes cron` subcommands |
Hermes supports:
## Scheduling Model
- one-shot delays
- intervals
- cron expressions
- explicit timestamps
Four schedule formats are supported:
The model-facing surface is a single `cronjob` tool with action-style operations:
| Format | Example | Behavior |
|--------|---------|----------|
| **Relative delay** | `30m`, `2h`, `1d` | One-shot, fires after the specified duration |
| **Interval** | `every 2h`, `every 30m` | Recurring, fires at regular intervals |
| **Cron expression** | `0 9 * * *` | Standard 5-field cron syntax (minute, hour, day, month, weekday) |
| **ISO timestamp** | `2025-01-15T09:00:00` | One-shot, fires at the exact time |
- `create`
- `list`
- `update`
- `pause`
- `resume`
- `run`
- `remove`
The model-facing surface is a single `cronjob` tool with action-style operations: `create`, `list`, `update`, `pause`, `resume`, `run`, `remove`.
## Job storage
## Job Storage
Cron jobs are stored in Hermes-managed local state (`~/.hermes/cron/jobs.json`) with atomic write semantics.
Jobs are stored in `~/.hermes/cron/jobs.json` with atomic write semantics (write to temp file, then rename). Each job record contains:
Each job can carry:
```json
{
"id": "job_abc123",
"name": "Daily briefing",
"prompt": "Summarize today's AI news and funding rounds",
"schedule": "0 9 * * *",
"skills": ["ai-funding-daily-report"],
"deliver": "telegram:-1001234567890",
"repeat": null,
"state": "scheduled",
"next_run": "2025-01-16T09:00:00Z",
"run_count": 42,
"created_at": "2025-01-01T00:00:00Z",
"model": null,
"provider": null,
"script": null
}
```
- prompt
- schedule metadata
- repeat counters
- delivery target
- lifecycle state (`scheduled`, `paused`, `completed`, etc.)
- zero, one, or multiple attached skills
### Job Lifecycle States
Backward compatibility is preserved for older jobs that only stored a legacy single `skill` field or none of the newer lifecycle fields.
| State | Meaning |
|-------|---------|
| `scheduled` | Active, will fire at next scheduled time |
| `paused` | Suspended — won't fire until resumed |
| `completed` | Repeat count exhausted or one-shot that has fired |
| `running` | Currently executing (transient state) |
## Runtime behavior
### Backward Compatibility
The scheduler:
Older jobs may have a single `skill` field instead of the `skills` array. The scheduler normalizes this at load time — single `skill` is promoted to `skills: [skill]`.
- loads jobs
- computes due work
- executes jobs in fresh agent sessions
- optionally injects one or more skills before the prompt
- handles repeat counters
- updates next-run metadata and state
## Scheduler Runtime
In gateway mode, cron ticking is integrated into the long-running gateway loop.
### Tick Cycle
## Skill-backed jobs
The scheduler runs on a periodic tick (default: every 60 seconds):
A cron job may attach multiple skills. At runtime, Hermes loads those skills in order and then appends the job prompt as the task instruction.
```text
tick()
1. Acquire scheduler lock (prevents overlapping ticks)
2. Load all jobs from jobs.json
3. Filter to due jobs (next_run <= now AND state == "scheduled")
4. For each due job:
a. Set state to "running"
b. Create fresh AIAgent session (no conversation history)
c. Load attached skills in order (injected as user messages)
d. Run the job prompt through the agent
e. Deliver the response to the configured target
f. Update run_count, compute next_run
g. If repeat count exhausted → state = "completed"
h. Otherwise → state = "scheduled"
5. Write updated jobs back to jobs.json
6. Release scheduler lock
```
This gives scheduled jobs reusable guidance without requiring the user to paste full skill bodies into the cron prompt.
### Gateway Integration
## Recursion guard
In gateway mode, the scheduler tick is integrated into the gateway's main event loop. The gateway calls `scheduler.tick()` on its periodic maintenance cycle, which runs alongside message handling.
Cron-run sessions disable the `cronjob` toolset. This prevents a scheduled job from recursively creating or mutating more cron jobs and accidentally exploding token usage or scheduler load.
In CLI mode, cron jobs only fire when `hermes cron` commands are run or during active CLI sessions.
## Delivery model
### Fresh Session Isolation
Cron jobs can deliver to:
Each cron job runs in a completely fresh agent session:
- origin chat
- local files
- platform home channels
- explicit platform/chat IDs
- No conversation history from previous runs
- No memory of previous cron executions (unless persisted to memory/files)
- The prompt must be self-contained — cron jobs cannot ask clarifying questions
- The `cronjob` toolset is disabled (recursion guard)
## Skill-Backed Jobs
A cron job can attach one or more skills via the `skills` field. At execution time:
1. Skills are loaded in the specified order
2. Each skill's SKILL.md content is injected as context
3. The job's prompt is appended as the task instruction
4. The agent processes the combined skill context + prompt
This enables reusable, tested workflows without pasting full instructions into cron prompts. For example:
```
Create a daily funding report → attach "ai-funding-daily-report" skill
```
### Script-Backed Jobs
Jobs can also attach a Python script via the `script` field. The script runs *before* each agent turn, and its stdout is injected into the prompt as context. This enables data collection and change detection patterns:
```python
# ~/.hermes/scripts/check_competitors.py
import requests, json
# Fetch competitor release notes, diff against last run
# Print summary to stdout — agent analyzes and reports
```
## Delivery Model
Cron job results can be delivered to any supported platform:
| Target | Syntax | Example |
|--------|--------|---------|
| Origin chat | `origin` | Deliver to the chat where the job was created |
| Local file | `local` | Save to `~/.hermes/cron/output/` |
| Telegram | `telegram` or `telegram:<chat_id>` | `telegram:-1001234567890` |
| Discord | `discord` or `discord:#channel` | `discord:#engineering` |
| Slack | `slack` | Deliver to Slack home channel |
| WhatsApp | `whatsapp` | Deliver to WhatsApp home |
| Signal | `signal` | Deliver to Signal |
| Matrix | `matrix` | Deliver to Matrix home room |
| Mattermost | `mattermost` | Deliver to Mattermost home |
| Email | `email` | Deliver via email |
| SMS | `sms` | Deliver via SMS |
| Home Assistant | `homeassistant` | Deliver to HA conversation |
| DingTalk | `dingtalk` | Deliver to DingTalk |
| Feishu | `feishu` | Deliver to Feishu |
| WeCom | `wecom` | Deliver to WeCom |
For Telegram topics, use the format `telegram:<chat_id>:<thread_id>` (e.g., `telegram:-1001234567890:17585`).
### Response Wrapping
By default (`cron.wrap_response: true`), cron deliveries are wrapped with:
- A header identifying the cron job name and task
- A footer noting the agent cannot see the delivered message in conversation
The `[SILENT]` prefix in a cron response suppresses delivery entirely — useful for jobs that only need to write to files or perform side effects.
### Session Isolation
Cron deliveries are NOT mirrored into gateway session conversation history. They exist only in the cron job's own session. This prevents message alternation violations in the target chat's conversation.
## Recursion Guard
Cron-run sessions have the `cronjob` toolset disabled. This prevents:
- A scheduled job from creating new cron jobs
- Recursive scheduling that could explode token usage
- Accidental mutation of the job schedule from within a job
## Locking
Hermes uses lock-based protections so overlapping scheduler ticks do not execute the same due-job batch twice.
The scheduler uses file-based locking to prevent overlapping ticks from executing the same due-job batch twice. This is important in gateway mode where multiple maintenance cycles could overlap if a previous tick takes longer than the tick interval.
## Related docs
## CLI Interface
- [Cron feature guide](../user-guide/features/cron.md)
The `hermes cron` CLI provides direct job management:
```bash
hermes cron list # Show all jobs
hermes cron add # Interactive job creation
hermes cron edit <job_id> # Edit job configuration
hermes cron pause <job_id> # Pause a running job
hermes cron resume <job_id> # Resume a paused job
hermes cron run <job_id> # Trigger immediate execution
hermes cron remove <job_id> # Delete a job
```
## Related Docs
- [Cron Feature Guide](/docs/user-guide/features/cron)
- [Gateway Internals](./gateway-internals.md)
- [Agent Loop Internals](./agent-loop.md)

View file

@ -6,106 +6,248 @@ description: "How the messaging gateway boots, authorizes users, routes sessions
# Gateway Internals
The messaging gateway is the long-running process that connects Hermes to external platforms.
The messaging gateway is the long-running process that connects Hermes to 14+ external messaging platforms through a unified architecture.
Key files:
## Key Files
- `gateway/run.py`
- `gateway/config.py`
- `gateway/session.py`
- `gateway/delivery.py`
- `gateway/pairing.py`
- `gateway/channel_directory.py`
- `gateway/hooks.py`
- `gateway/mirror.py`
- `gateway/platforms/*`
| File | Purpose |
|------|---------|
| `gateway/run.py` | `GatewayRunner` — main loop, slash commands, message dispatch (~7,200 lines) |
| `gateway/session.py` | `SessionStore` — conversation persistence and session key construction |
| `gateway/delivery.py` | Outbound message delivery to target platforms/channels |
| `gateway/pairing.py` | DM pairing flow for user authorization |
| `gateway/channel_directory.py` | Maps chat IDs to human-readable names for cron delivery |
| `gateway/hooks.py` | Hook discovery, loading, and lifecycle event dispatch |
| `gateway/mirror.py` | Cross-session message mirroring for `send_message` |
| `gateway/status.py` | Token lock management for profile-scoped gateway instances |
| `gateway/builtin_hooks/` | Always-registered hooks (e.g., BOOT.md system prompt hook) |
| `gateway/platforms/` | Platform adapters (one per messaging platform) |
## Core responsibilities
## Architecture Overview
The gateway process is responsible for:
```text
┌─────────────────────────────────────────────────┐
│ GatewayRunner │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Telegram │ │ Discord │ │ Slack │ ... │
│ │ Adapter │ │ Adapter │ │ Adapter │ │
│ └─────┬─────┘ └─────┬────┘ └─────┬────┘ │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ ▼ │
│ _handle_message() │
│ │ │
│ ┌────────────┼────────────┐ │
│ ▼ ▼ ▼ │
│ Slash command AIAgent Queue/BG │
│ dispatch creation sessions │
│ │ │
│ ▼ │
│ SessionStore │
│ (SQLite persistence) │
└─────────────────────────────────────────────────┘
```
- loading configuration from `.env`, `config.yaml`, and `gateway.json`
- starting platform adapters
- authorizing users
- routing incoming events to sessions
- maintaining per-chat session continuity
- dispatching messages to `AIAgent`
- running cron ticks and background maintenance tasks
- mirroring/proactively delivering output to configured channels
## Message Flow
## Config sources
When a message arrives from any platform:
The gateway has a multi-source config model:
1. **Platform adapter** receives raw event, normalizes it into a `MessageEvent`
2. **Base adapter** checks active session guard:
- If agent is running for this session → queue message, set interrupt event
- If `/approve`, `/deny`, `/stop` → bypass guard (dispatched inline)
3. **GatewayRunner._handle_message()** receives the event:
- Resolve session key via `_session_key_for_source()` (format: `agent:main:{platform}:{chat_type}:{chat_id}`)
- Check authorization (see Authorization below)
- Check if it's a slash command → dispatch to command handler
- Check if agent is already running → intercept commands like `/stop`, `/status`
- Otherwise → create `AIAgent` instance and run conversation
4. **Response** is sent back through the platform adapter
- environment variables
- `~/.hermes/gateway.json`
- selected bridged values from `~/.hermes/config.yaml`
### Session Key Format
## Session routing
Session keys encode the full routing context:
`gateway/session.py` and `GatewayRunner` cooperate to map incoming messages to active session IDs.
```
agent:main:{platform}:{chat_type}:{chat_id}
```
Session keying can depend on:
For example: `agent:main:telegram:private:123456789`
- platform
- user/chat identity
- thread/topic identity
- special platform-specific routing behavior
Thread-aware platforms (Telegram forum topics, Discord threads, Slack threads) may include thread IDs in the chat_id portion. **Never construct session keys manually** — always use `build_session_key()` from `gateway/session.py`.
## Authorization layers
### Two-Level Message Guard
The gateway can authorize through:
When an agent is actively running, incoming messages pass through two sequential guards:
- platform allowlists
- gateway-wide allowlists
- DM pairing flows
- explicit allow-all settings
1. **Level 1 — Base adapter** (`gateway/platforms/base.py`): Checks `_active_sessions`. If the session is active, queues the message in `_pending_messages` and sets an interrupt event. This catches messages *before* they reach the gateway runner.
Pairing support is implemented in `gateway/pairing.py`.
2. **Level 2 — Gateway runner** (`gateway/run.py`): Checks `_running_agents`. Intercepts specific commands (`/stop`, `/new`, `/queue`, `/status`, `/approve`, `/deny`) and routes them appropriately. Everything else triggers `running_agent.interrupt()`.
## Delivery path
Commands that must reach the runner while the agent is blocked (like `/approve`) are dispatched **inline** via `await self._message_handler(event)` — they bypass the background task system to avoid race conditions.
Outgoing deliveries are handled by `gateway/delivery.py`, which knows how to:
## Authorization
- deliver to a home channel
- resolve explicit targets
- mirror some remote deliveries back into local history/session tracking
The gateway uses a multi-layer authorization check, evaluated in order:
1. **Gateway-wide allow-all** (`GATEWAY_ALLOW_ALL_USERS`) — if set, all users are authorized
2. **Platform allowlist** (e.g., `TELEGRAM_ALLOWED_USERS`) — comma-separated user IDs
3. **DM pairing** — authenticated users can pair new users via a pairing code
4. **Admin escalation** — some commands require admin status beyond basic authorization
### DM Pairing Flow
```text
Admin: /pair
Gateway: "Pairing code: ABC123. Share with the user."
New user: ABC123
Gateway: "Paired! You're now authorized."
```
Pairing state is persisted in `gateway/pairing.py` and survives restarts.
## Slash Command Dispatch
All slash commands in the gateway flow through the same resolution pipeline:
1. `resolve_command()` from `hermes_cli/commands.py` maps input to canonical name (handles aliases, prefix matching)
2. The canonical name is checked against `GATEWAY_KNOWN_COMMANDS`
3. Handler in `_handle_message()` dispatches based on canonical name
4. Some commands are gated on config (`gateway_config_gate` on `CommandDef`)
### Running-Agent Guard
Commands that must NOT execute while the agent is processing are rejected early:
```python
if _quick_key in self._running_agents:
if canonical == "model":
return "⏳ Agent is running — wait for it to finish or /stop first."
```
Bypass commands (`/stop`, `/new`, `/approve`, `/deny`, `/queue`, `/status`) have special handling.
## Config Sources
The gateway reads configuration from multiple sources:
| Source | What it provides |
|--------|-----------------|
| `~/.hermes/.env` | API keys, bot tokens, platform credentials |
| `~/.hermes/config.yaml` | Model settings, tool configuration, display options |
| Environment variables | Override any of the above |
Unlike the CLI (which uses `load_cli_config()` with hardcoded defaults), the gateway reads `config.yaml` directly via YAML loader. This means config keys that exist in the CLI's defaults dict but not in the user's config file may behave differently between CLI and gateway.
## Platform Adapters
Each messaging platform has an adapter in `gateway/platforms/`:
```text
gateway/platforms/
├── base.py # BaseAdapter — shared logic for all platforms
├── telegram.py # Telegram Bot API (long polling or webhook)
├── discord.py # Discord bot via discord.py
├── slack.py # Slack Socket Mode
├── whatsapp.py # WhatsApp Business Cloud API
├── signal.py # Signal via signal-cli REST API
├── matrix.py # Matrix via matrix-nio (optional E2EE)
├── mattermost.py # Mattermost WebSocket API
├── email_adapter.py # Email via IMAP/SMTP
├── sms.py # SMS via Twilio
├── dingtalk.py # DingTalk WebSocket
├── feishu.py # Feishu/Lark WebSocket or webhook
├── wecom.py # WeCom (WeChat Work) callback
└── homeassistant.py # Home Assistant conversation integration
```
Adapters implement a common interface:
- `connect()` / `disconnect()` — lifecycle management
- `send_message()` — outbound message delivery
- `on_message()` — inbound message normalization → `MessageEvent`
### Token Locks
Adapters that connect with unique credentials call `acquire_scoped_lock()` in `connect()` and `release_scoped_lock()` in `disconnect()`. This prevents two profiles from using the same bot token simultaneously.
## Delivery Path
Outgoing deliveries (`gateway/delivery.py`) handle:
- **Direct reply** — send response back to the originating chat
- **Home channel delivery** — route cron job outputs and background results to a configured home channel
- **Explicit target delivery**`send_message` tool specifying `telegram:-1001234567890`
- **Cross-platform delivery** — deliver to a different platform than the originating message
Cron job deliveries are NOT mirrored into gateway session history — they live in their own cron session only. This is a deliberate design choice to avoid message alternation violations.
## Hooks
Gateway events emit hook callbacks through `gateway/hooks.py`. Hooks are local trusted Python code and can observe or extend gateway lifecycle events.
Gateway hooks are Python modules that respond to lifecycle events:
## Background maintenance
### Gateway Hook Events
The gateway also runs maintenance tasks such as:
| Event | When fired |
|-------|-----------|
| `gateway:startup` | Gateway process starts |
| `session:start` | New conversation session begins |
| `session:end` | Session completes or times out |
| `session:reset` | User resets session with `/new` |
| `agent:start` | Agent begins processing a message |
| `agent:step` | Agent completes one tool-calling iteration |
| `agent:end` | Agent finishes and returns response |
| `command:*` | Any slash command is executed |
- cron ticking
- cache refreshes
- session expiry checks
- proactive memory flush before reset/expiry
Hooks are discovered from `gateway/builtin_hooks/` (always active) and `~/.hermes/hooks/` (user-installed). Each hook is a directory with a `HOOK.yaml` manifest and `handler.py`.
## Honcho interaction
## Memory Provider Integration
When a memory provider plugin (e.g. Honcho) is enabled, the gateway creates an AIAgent per incoming message with the same session ID. The memory provider's `initialize()` receives the session ID and creates the appropriate backend session. Tools are routed through the `MemoryManager`, which handles all provider lifecycle hooks (prefetch, sync, session end).
When a memory provider plugin (e.g., Honcho) is enabled:
### Memory provider session routing
1. Gateway creates an `AIAgent` per message with the session ID
2. The `MemoryManager` initializes the provider with the session context
3. Provider tools (e.g., `honcho_profile`, `viking_search`) are routed through:
Memory provider tools (e.g. `honcho_profile`, `viking_search`) are routed through the MemoryManager in `_invoke_tool()`:
```
```text
AIAgent._invoke_tool()
→ self._memory_manager.handle_tool_call(name, args)
→ provider.handle_tool_call(name, args)
```
Each memory provider manages its own session lifecycle internally. The `initialize()` method receives the session ID, and `on_session_end()` handles cleanup and final flush.
4. On session end/reset, `on_session_end()` fires for cleanup and final data flush
### Memory flush lifecycle
### Memory Flush Lifecycle
When a session is reset, resumed, or expires, the gateway flushes built-in memories before discarding context. The flush creates a temporary `AIAgent` that runs a memory-only conversation turn. The memory provider's `on_session_end()` hook fires during this process, giving external providers a chance to persist any buffered data.
When a session is reset, resumed, or expires:
1. Built-in memories are flushed to disk
2. Memory provider's `on_session_end()` hook fires
3. A temporary `AIAgent` runs a memory-only conversation turn
4. Context is then discarded or archived
## Related docs
## Background Maintenance
The gateway runs periodic maintenance alongside message handling:
- **Cron ticking** — checks job schedules and fires due jobs
- **Session expiry** — cleans up abandoned sessions after timeout
- **Memory flush** — proactively flushes memory before session expiry
- **Cache refresh** — refreshes model lists and provider status
## Process Management
The gateway runs as a long-lived process, managed via:
- `hermes gateway start` / `hermes gateway stop` — manual control
- `systemctl` (Linux) or `launchctl` (macOS) — service management
- PID file at `~/.hermes/gateway.pid` — profile-scoped process tracking
**Profile-scoped vs global**: `start_gateway()` uses profile-scoped PID files. `hermes gateway stop` stops only the current profile's gateway. `hermes gateway stop --all` uses global `ps aux` scanning to kill all gateway processes (used during updates).
## Related Docs
- [Session Storage](./session-storage.md)
- [Cron Internals](./cron-internals.md)
- [ACP Internals](./acp-internals.md)
- [Agent Loop Internals](./agent-loop.md)
- [Messaging Gateway (User Guide)](/docs/user-guide/messaging)

View file

@ -3,7 +3,7 @@
Hermes Agent saves conversation trajectories in ShareGPT-compatible JSONL format
for use as training data, debugging artifacts, and reinforcement learning datasets.
Source files: `agent/trajectory.py`, `run_agent.py` (lines 1788-1975), `batch_runner.py`
Source files: `agent/trajectory.py`, `run_agent.py` (search for `_save_trajectory`), `batch_runner.py`
## File Naming Convention