mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
docs: comprehensive documentation audit — fix stale info, expand thin pages, add depth (#5393)
Major changes across 20 documentation pages: Staleness fixes: - Fix FAQ: wrong import path (hermes.agent → run_agent) - Fix FAQ: stale Gemini 2.0 model → Gemini 3 Flash - Fix integrations/index: missing MiniMax TTS provider - Fix integrations/index: web_crawl is not a registered tool - Fix sessions: add all 19 session sources (was only 5) - Fix cron: add all 18 delivery targets (was only telegram/discord) - Fix webhooks: add all delivery targets - Fix overview: add missing MCP, memory providers, credential pools - Fix all line-number references → use function name searches instead - Update file size estimates (run_agent ~9200, gateway ~7200, cli ~8500) Expanded thin pages (< 150 lines → substantial depth): - honcho.md: 43 → 108 lines — added feature comparison, tools, config, CLI - overview.md: 49 → 55 lines — added MCP, memory providers, credential pools - toolsets-reference.md: 57 → 175 lines — added explanations, config examples, custom toolsets, wildcards, platform differences table - optional-skills-catalog.md: 74 → 153 lines — added 25+ missing skills across communication, devops, mlops (18!), productivity, research categories - integrations/index.md: 82 → 115 lines — added messaging, HA, plugins sections - cron-internals.md: 90 → 195 lines — added job JSON example, lifecycle states, tick cycle, delivery targets, script-backed jobs, CLI interface - gateway-internals.md: 111 → 250 lines — added architecture diagram, message flow, two-level guard, platform adapters, token locks, process management - agent-loop.md: 112 → 235 lines — added entry points, API mode resolution, turn lifecycle detail, message alternation rules, tool execution flow, callback table, budget tracking, compression details - architecture.md: 152 → 295 lines — added system overview diagram, data flow diagrams, design principles table, dependency chain Other depth additions: - context-references.md: added platform availability, compression interaction, common patterns sections - slash-commands.md: added quick commands config example, alias resolution - image-generation.md: added platform delivery table - tools-reference.md: added tool counts, MCP tools note - index.md: updated platform count (5 → 14+), tool count (40+ → 47)
This commit is contained in:
parent
fec58ad99e
commit
43d468cea8
20 changed files with 1243 additions and 406 deletions
|
|
@ -6,107 +6,231 @@ description: "Detailed walkthrough of AIAgent execution, API modes, tools, callb
|
|||
|
||||
# Agent Loop Internals
|
||||
|
||||
The core orchestration engine is `run_agent.py`'s `AIAgent`.
|
||||
The core orchestration engine is `run_agent.py`'s `AIAgent` class — roughly 9,200 lines that handle everything from prompt assembly to tool dispatch to provider failover.
|
||||
|
||||
## Core responsibilities
|
||||
## Core Responsibilities
|
||||
|
||||
`AIAgent` is responsible for:
|
||||
|
||||
- assembling the effective prompt and tool schemas
|
||||
- selecting the correct provider/API mode
|
||||
- making interruptible model calls
|
||||
- executing tool calls (sequentially or concurrently)
|
||||
- maintaining session history
|
||||
- handling compression, retries, and fallback models
|
||||
- Assembling the effective system prompt and tool schemas via `prompt_builder.py`
|
||||
- Selecting the correct provider/API mode (chat_completions, codex_responses, anthropic_messages)
|
||||
- Making interruptible model calls with cancellation support
|
||||
- Executing tool calls (sequentially or concurrently via thread pool)
|
||||
- Maintaining conversation history in OpenAI message format
|
||||
- Handling compression, retries, and fallback model switching
|
||||
- Tracking iteration budgets across parent and child agents
|
||||
- Flushing persistent memory before context is lost
|
||||
|
||||
## API modes
|
||||
## Two Entry Points
|
||||
|
||||
Hermes currently supports three API execution modes:
|
||||
```python
|
||||
# Simple interface — returns final response string
|
||||
response = agent.chat("Fix the bug in main.py")
|
||||
|
||||
| API mode | Used for |
|
||||
|----------|----------|
|
||||
| `chat_completions` | OpenAI-compatible chat endpoints, including OpenRouter and most custom endpoints |
|
||||
| `codex_responses` | OpenAI Codex / Responses API path |
|
||||
| `anthropic_messages` | Native Anthropic Messages API |
|
||||
# Full interface — returns dict with messages, metadata, usage stats
|
||||
result = agent.run_conversation(
|
||||
user_message="Fix the bug in main.py",
|
||||
system_message=None, # auto-built if omitted
|
||||
conversation_history=None, # auto-loaded from session if omitted
|
||||
task_id="task_abc123"
|
||||
)
|
||||
```
|
||||
|
||||
The mode is resolved from explicit args, provider selection, and base URL heuristics.
|
||||
`chat()` is a thin wrapper around `run_conversation()` that extracts the `final_response` field from the result dict.
|
||||
|
||||
## Turn lifecycle
|
||||
## API Modes
|
||||
|
||||
Hermes supports three API execution modes, resolved from provider selection, explicit args, and base URL heuristics:
|
||||
|
||||
| API mode | Used for | Client type |
|
||||
|----------|----------|-------------|
|
||||
| `chat_completions` | OpenAI-compatible endpoints (OpenRouter, custom, most providers) | `openai.OpenAI` |
|
||||
| `codex_responses` | OpenAI Codex / Responses API | `openai.OpenAI` with Responses format |
|
||||
| `anthropic_messages` | Native Anthropic Messages API | `anthropic.Anthropic` via adapter |
|
||||
|
||||
The mode determines how messages are formatted, how tool calls are structured, how responses are parsed, and how caching/streaming works. All three converge on the same internal message format (OpenAI-style `role`/`content`/`tool_calls` dicts) before and after API calls.
|
||||
|
||||
**Mode resolution order:**
|
||||
1. Explicit `api_mode` constructor arg (highest priority)
|
||||
2. Provider-specific detection (e.g., `anthropic` provider → `anthropic_messages`)
|
||||
3. Base URL heuristics (e.g., `api.anthropic.com` → `anthropic_messages`)
|
||||
4. Default: `chat_completions`
|
||||
|
||||
## Turn Lifecycle
|
||||
|
||||
Each iteration of the agent loop follows this sequence:
|
||||
|
||||
```text
|
||||
run_conversation()
|
||||
-> generate effective task_id
|
||||
-> append current user message
|
||||
-> load or build cached system prompt
|
||||
-> maybe preflight-compress
|
||||
-> build api_messages
|
||||
-> inject ephemeral prompt layers
|
||||
-> apply prompt caching if appropriate
|
||||
-> make interruptible API call
|
||||
-> if tool calls: execute them, append tool results, loop
|
||||
-> if final text: persist, cleanup, return response
|
||||
1. Generate task_id if not provided
|
||||
2. Append user message to conversation history
|
||||
3. Build or reuse cached system prompt (prompt_builder.py)
|
||||
4. Check if preflight compression is needed (>50% context)
|
||||
5. Build API messages from conversation history
|
||||
- chat_completions: OpenAI format as-is
|
||||
- codex_responses: convert to Responses API input items
|
||||
- anthropic_messages: convert via anthropic_adapter.py
|
||||
6. Inject ephemeral prompt layers (budget warnings, context pressure)
|
||||
7. Apply prompt caching markers if on Anthropic
|
||||
8. Make interruptible API call (_api_call_with_interrupt)
|
||||
9. Parse response:
|
||||
- If tool_calls: execute them, append results, loop back to step 5
|
||||
- If text response: persist session, flush memory if needed, return
|
||||
```
|
||||
|
||||
## Interruptible API calls
|
||||
### Message Format
|
||||
|
||||
Hermes wraps API requests so they can be interrupted from the CLI or gateway.
|
||||
All messages use OpenAI-compatible format internally:
|
||||
|
||||
This matters because:
|
||||
```python
|
||||
{"role": "system", "content": "..."}
|
||||
{"role": "user", "content": "..."}
|
||||
{"role": "assistant", "content": "...", "tool_calls": [...]}
|
||||
{"role": "tool", "tool_call_id": "...", "content": "..."}
|
||||
```
|
||||
|
||||
- the agent may be in a long LLM call
|
||||
- the user may send a new message mid-flight
|
||||
- background systems may need cancellation semantics
|
||||
Reasoning content (from models that support extended thinking) is stored in `assistant_msg["reasoning"]` and optionally displayed via the `reasoning_callback`.
|
||||
|
||||
## Tool execution modes
|
||||
### Message Alternation Rules
|
||||
|
||||
Hermes uses two execution strategies:
|
||||
The agent loop enforces strict message role alternation:
|
||||
|
||||
- sequential execution for single or interactive tools
|
||||
- concurrent execution for multiple non-interactive tools
|
||||
- After the system message: `User → Assistant → User → Assistant → ...`
|
||||
- During tool calling: `Assistant (with tool_calls) → Tool → Tool → ... → Assistant`
|
||||
- **Never** two assistant messages in a row
|
||||
- **Never** two user messages in a row
|
||||
- **Only** `tool` role can have consecutive entries (parallel tool results)
|
||||
|
||||
Concurrent tool execution preserves message/result ordering when reinserting tool responses into conversation history.
|
||||
Providers validate these sequences and will reject malformed histories.
|
||||
|
||||
## Callback surfaces
|
||||
## Interruptible API Calls
|
||||
|
||||
`AIAgent` supports platform/integration callbacks such as:
|
||||
API requests are wrapped in `_api_call_with_interrupt()` which runs the actual HTTP call in a background thread while monitoring an interrupt event:
|
||||
|
||||
- `tool_progress_callback`
|
||||
- `thinking_callback`
|
||||
- `reasoning_callback`
|
||||
- `clarify_callback`
|
||||
- `step_callback`
|
||||
- `stream_delta_callback`
|
||||
- `tool_gen_callback`
|
||||
- `status_callback`
|
||||
```text
|
||||
┌──────────────────────┐ ┌──────────────┐
|
||||
│ Main thread │ │ API thread │
|
||||
│ wait on: │────▶│ HTTP POST │
|
||||
│ - response ready │ │ to provider │
|
||||
│ - interrupt event │ └──────────────┘
|
||||
│ - timeout │
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
These are how the CLI, gateway, and ACP integrations stream intermediate progress and interactive approval/clarification flows.
|
||||
When interrupted (user sends new message, `/stop` command, or signal):
|
||||
- The API thread is abandoned (response discarded)
|
||||
- The agent can process the new input or shut down cleanly
|
||||
- No partial response is injected into conversation history
|
||||
|
||||
## Budget and fallback behavior
|
||||
## Tool Execution
|
||||
|
||||
Hermes tracks a shared iteration budget across parent and subagents. It also injects budget pressure hints near the end of the available iteration window.
|
||||
### Sequential vs Concurrent
|
||||
|
||||
Fallback model support allows the agent to switch providers/models when the primary route fails in supported failure paths.
|
||||
When the model returns tool calls:
|
||||
|
||||
## Compression and persistence
|
||||
- **Single tool call** → executed directly in the main thread
|
||||
- **Multiple tool calls** → executed concurrently via `ThreadPoolExecutor`
|
||||
- Exception: tools marked as interactive (e.g., `clarify`) force sequential execution
|
||||
- Results are reinserted in the original tool call order regardless of completion order
|
||||
|
||||
Before and during long runs, Hermes may:
|
||||
### Execution Flow
|
||||
|
||||
- flush memory before context loss
|
||||
- compress middle conversation turns
|
||||
- split the session lineage into a new session ID after compression
|
||||
- preserve recent context and structural tool-call/result consistency
|
||||
```text
|
||||
for each tool_call in response.tool_calls:
|
||||
1. Resolve handler from tools/registry.py
|
||||
2. Fire pre_tool_call plugin hook
|
||||
3. Check if dangerous command (tools/approval.py)
|
||||
- If dangerous: invoke approval_callback, wait for user
|
||||
4. Execute handler with args + task_id
|
||||
5. Fire post_tool_call plugin hook
|
||||
6. Append {"role": "tool", "content": result} to history
|
||||
```
|
||||
|
||||
## Key files to read next
|
||||
### Agent-Level Tools
|
||||
|
||||
- `run_agent.py`
|
||||
- `agent/prompt_builder.py`
|
||||
- `agent/context_compressor.py`
|
||||
- `agent/prompt_caching.py`
|
||||
- `model_tools.py`
|
||||
Some tools are intercepted by `run_agent.py` *before* reaching `handle_function_call()`:
|
||||
|
||||
## Related docs
|
||||
| Tool | Why intercepted |
|
||||
|------|-----------------|
|
||||
| `todo` | Reads/writes agent-local task state |
|
||||
| `memory` | Writes to persistent memory files with character limits |
|
||||
|
||||
These tools modify agent state directly and return synthetic tool results without going through the registry.
|
||||
|
||||
## Callback Surfaces
|
||||
|
||||
`AIAgent` supports platform-specific callbacks that enable real-time progress in the CLI, gateway, and ACP integrations:
|
||||
|
||||
| Callback | When fired | Used by |
|
||||
|----------|-----------|---------|
|
||||
| `tool_progress_callback` | Before/after each tool execution | CLI spinner, gateway progress messages |
|
||||
| `thinking_callback` | When model starts/stops thinking | CLI "thinking..." indicator |
|
||||
| `reasoning_callback` | When model returns reasoning content | CLI reasoning display, gateway reasoning blocks |
|
||||
| `clarify_callback` | When `clarify` tool is called | CLI input prompt, gateway interactive message |
|
||||
| `step_callback` | After each complete agent turn | Gateway step tracking, ACP progress |
|
||||
| `stream_delta_callback` | Each streaming token (when enabled) | CLI streaming display |
|
||||
| `tool_gen_callback` | When tool call is parsed from stream | CLI tool preview in spinner |
|
||||
| `status_callback` | State changes (thinking, executing, etc.) | ACP status updates |
|
||||
|
||||
## Budget and Fallback Behavior
|
||||
|
||||
### Iteration Budget
|
||||
|
||||
The agent tracks iterations via `IterationBudget`:
|
||||
|
||||
- Default: 90 iterations (configurable via `agent.max_turns`)
|
||||
- Shared across parent and child agents — a subagent consumes from the parent's budget
|
||||
- At 70%+ usage, `_get_budget_warning()` appends a `[BUDGET WARNING: ...]` to the last tool result
|
||||
- At 100%, the agent stops and returns a summary of work done
|
||||
|
||||
### Fallback Model
|
||||
|
||||
When the primary model fails (429 rate limit, 5xx server error, 401/403 auth error):
|
||||
|
||||
1. Check `fallback_providers` list in config
|
||||
2. Try each fallback in order
|
||||
3. On success, continue the conversation with the new provider
|
||||
4. On 401/403, attempt credential refresh before failing over
|
||||
|
||||
The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the `auxiliary.*` config section.
|
||||
|
||||
## Compression and Persistence
|
||||
|
||||
### When Compression Triggers
|
||||
|
||||
- **Preflight** (before API call): If conversation exceeds 50% of model's context window
|
||||
- **Gateway auto-compression**: If conversation exceeds 85% (more aggressive, runs between turns)
|
||||
|
||||
### What Happens During Compression
|
||||
|
||||
1. Memory is flushed to disk first (preventing data loss)
|
||||
2. Middle conversation turns are summarized into a compact summary
|
||||
3. The last N messages are preserved intact (`compression.protect_last_n`, default: 20)
|
||||
4. Tool call/result message pairs are kept together (never split)
|
||||
5. A new session lineage ID is generated (compression creates a "child" session)
|
||||
|
||||
### Session Persistence
|
||||
|
||||
After each turn:
|
||||
- Messages are saved to the session store (SQLite via `hermes_state.py`)
|
||||
- Memory changes are flushed to `MEMORY.md` / `USER.md`
|
||||
- The session can be resumed later via `/resume` or `hermes chat --resume`
|
||||
|
||||
## Key Source Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `run_agent.py` | AIAgent class — the complete agent loop (~9,200 lines) |
|
||||
| `agent/prompt_builder.py` | System prompt assembly from memory, skills, context files, personality |
|
||||
| `agent/context_compressor.py` | Conversation compression algorithm |
|
||||
| `agent/prompt_caching.py` | Anthropic prompt caching markers and cache metrics |
|
||||
| `agent/auxiliary_client.py` | Auxiliary LLM client for side tasks (vision, summarization) |
|
||||
| `model_tools.py` | Tool schema collection, `handle_function_call()` dispatch |
|
||||
|
||||
## Related Docs
|
||||
|
||||
- [Provider Runtime Resolution](./provider-runtime.md)
|
||||
- [Prompt Assembly](./prompt-assembly.md)
|
||||
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
|
||||
- [Tools Runtime](./tools-runtime.md)
|
||||
- [Architecture Overview](./architecture.md)
|
||||
|
|
|
|||
|
|
@ -1,152 +1,274 @@
|
|||
---
|
||||
sidebar_position: 1
|
||||
title: "Architecture"
|
||||
description: "Hermes Agent internals — major subsystems, execution paths, and where to read next"
|
||||
description: "Hermes Agent internals — major subsystems, execution paths, data flow, and where to read next"
|
||||
---
|
||||
|
||||
# Architecture
|
||||
|
||||
This page is the top-level map of Hermes Agent internals. The project has grown beyond a single monolithic loop, so the best way to understand it is by subsystem.
|
||||
This page is the top-level map of Hermes Agent internals. Use it to orient yourself in the codebase, then dive into subsystem-specific docs for implementation details.
|
||||
|
||||
## High-level structure
|
||||
## System Overview
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Entry Points │
|
||||
│ │
|
||||
│ CLI (cli.py) Gateway (gateway/run.py) ACP (acp_adapter/) │
|
||||
│ Batch Runner API Server Python Library │
|
||||
└──────────┬──────────────┬───────────────────────┬────────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ AIAgent (run_agent.py) │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Prompt │ │ Provider │ │ Tool │ │
|
||||
│ │ Builder │ │ Resolution │ │ Dispatch │ │
|
||||
│ │ (prompt_ │ │ (runtime_ │ │ (model_ │ │
|
||||
│ │ builder.py) │ │ provider.py)│ │ tools.py) │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ ┌──────┴───────┐ ┌──────┴───────┐ ┌──────┴───────┐ │
|
||||
│ │ Compression │ │ 3 API Modes │ │ Tool Registry│ │
|
||||
│ │ & Caching │ │ chat_compl. │ │ (registry.py)│ │
|
||||
│ │ │ │ codex_resp. │ │ 47 tools │ │
|
||||
│ │ │ │ anthropic │ │ 37 toolsets │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────────┐ ┌──────────────────────┐
|
||||
│ Session Storage │ │ Tool Backends │
|
||||
│ (SQLite + FTS5) │ │ Terminal (6 backends) │
|
||||
│ hermes_state.py │ │ Browser (5 backends) │
|
||||
│ gateway/session.py│ │ Web (4 backends) │
|
||||
└───────────────────┘ │ MCP (dynamic) │
|
||||
│ File, Vision, etc. │
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```text
|
||||
hermes-agent/
|
||||
├── run_agent.py # AIAgent core loop
|
||||
├── cli.py # interactive terminal UI
|
||||
├── model_tools.py # tool discovery/orchestration
|
||||
├── toolsets.py # tool groupings and presets
|
||||
├── hermes_state.py # SQLite session/state database
|
||||
├── batch_runner.py # batch trajectory generation
|
||||
├── run_agent.py # AIAgent — core conversation loop (~9,200 lines)
|
||||
├── cli.py # HermesCLI — interactive terminal UI (~8,500 lines)
|
||||
├── model_tools.py # Tool discovery, schema collection, dispatch
|
||||
├── toolsets.py # Tool groupings and platform presets
|
||||
├── hermes_state.py # SQLite session/state database with FTS5
|
||||
├── hermes_constants.py # HERMES_HOME, profile-aware paths
|
||||
├── batch_runner.py # Batch trajectory generation
|
||||
│
|
||||
├── agent/ # prompt building, compression, caching, metadata, trajectories
|
||||
├── hermes_cli/ # command entrypoints, auth, setup, models, config, doctor
|
||||
├── tools/ # tool implementations and terminal environments
|
||||
├── gateway/ # messaging gateway, session routing, delivery, pairing, hooks
|
||||
├── cron/ # scheduled job storage and scheduler
|
||||
├── plugins/memory/ # Memory provider plugins (honcho, openviking, mem0, etc.)
|
||||
├── acp_adapter/ # ACP editor integration server
|
||||
├── acp_registry/ # ACP registry manifest + icon
|
||||
├── environments/ # Hermes RL / benchmark environment framework
|
||||
├── skills/ # bundled skills
|
||||
├── optional-skills/ # official optional skills
|
||||
└── tests/ # test suite
|
||||
├── agent/ # Agent internals
|
||||
│ ├── prompt_builder.py # System prompt assembly
|
||||
│ ├── context_compressor.py # Conversation compression algorithm
|
||||
│ ├── prompt_caching.py # Anthropic prompt caching
|
||||
│ ├── auxiliary_client.py # Auxiliary LLM for side tasks (vision, summarization)
|
||||
│ ├── model_metadata.py # Model context lengths, token estimation
|
||||
│ ├── models_dev.py # models.dev registry integration
|
||||
│ ├── anthropic_adapter.py # Anthropic Messages API format conversion
|
||||
│ ├── display.py # KawaiiSpinner, tool preview formatting
|
||||
│ ├── skill_commands.py # Skill slash commands
|
||||
│ ├── memory_store.py # Persistent memory read/write
|
||||
│ └── trajectory.py # Trajectory saving helpers
|
||||
│
|
||||
├── hermes_cli/ # CLI subcommands and setup
|
||||
│ ├── main.py # Entry point — all `hermes` subcommands (~4,200 lines)
|
||||
│ ├── config.py # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
|
||||
│ ├── commands.py # COMMAND_REGISTRY — central slash command definitions
|
||||
│ ├── auth.py # PROVIDER_REGISTRY, credential resolution
|
||||
│ ├── runtime_provider.py # Provider → api_mode + credentials
|
||||
│ ├── models.py # Model catalog, provider model lists
|
||||
│ ├── model_switch.py # /model command logic (CLI + gateway shared)
|
||||
│ ├── setup.py # Interactive setup wizard (~3,500 lines)
|
||||
│ ├── skin_engine.py # CLI theming engine
|
||||
│ ├── skills_config.py # hermes skills — enable/disable per platform
|
||||
│ ├── skills_hub.py # /skills slash command
|
||||
│ ├── tools_config.py # hermes tools — enable/disable per platform
|
||||
│ ├── plugins.py # PluginManager — discovery, loading, hooks
|
||||
│ ├── callbacks.py # Terminal callbacks (clarify, sudo, approval)
|
||||
│ └── gateway.py # hermes gateway start/stop
|
||||
│
|
||||
├── tools/ # Tool implementations (one file per tool)
|
||||
│ ├── registry.py # Central tool registry
|
||||
│ ├── approval.py # Dangerous command detection
|
||||
│ ├── terminal_tool.py # Terminal orchestration
|
||||
│ ├── process_registry.py # Background process management
|
||||
│ ├── file_tools.py # read_file, write_file, patch, search_files
|
||||
│ ├── web_tools.py # web_search, web_extract
|
||||
│ ├── browser_tool.py # 11 browser automation tools
|
||||
│ ├── code_execution_tool.py # execute_code sandbox
|
||||
│ ├── delegate_tool.py # Subagent delegation
|
||||
│ ├── mcp_tool.py # MCP client (~1,050 lines)
|
||||
│ ├── credential_files.py # File-based credential passthrough
|
||||
│ ├── env_passthrough.py # Env var passthrough for sandboxes
|
||||
│ ├── ansi_strip.py # ANSI escape stripping
|
||||
│ └── environments/ # Terminal backends (local, docker, ssh, modal, daytona, singularity)
|
||||
│
|
||||
├── gateway/ # Messaging platform gateway
|
||||
│ ├── run.py # GatewayRunner — message dispatch (~5,800 lines)
|
||||
│ ├── session.py # SessionStore — conversation persistence
|
||||
│ ├── delivery.py # Outbound message delivery
|
||||
│ ├── pairing.py # DM pairing authorization
|
||||
│ ├── hooks.py # Hook discovery and lifecycle events
|
||||
│ ├── mirror.py # Cross-session message mirroring
|
||||
│ ├── status.py # Token locks, profile-scoped process tracking
|
||||
│ ├── builtin_hooks/ # Always-registered hooks
|
||||
│ └── platforms/ # 14 adapters: telegram, discord, slack, whatsapp,
|
||||
│ # signal, matrix, mattermost, email, sms,
|
||||
│ # dingtalk, feishu, wecom, homeassistant, webhook
|
||||
│
|
||||
├── acp_adapter/ # ACP server (VS Code / Zed / JetBrains)
|
||||
├── cron/ # Scheduler (jobs.py, scheduler.py)
|
||||
├── plugins/memory/ # Memory provider plugins
|
||||
├── environments/ # RL training environments (Atropos)
|
||||
├── skills/ # Bundled skills (always available)
|
||||
├── optional-skills/ # Official optional skills (install explicitly)
|
||||
├── website/ # Docusaurus documentation site
|
||||
└── tests/ # Pytest suite (~3,000+ tests)
|
||||
```
|
||||
|
||||
## Recommended reading order
|
||||
## Data Flow
|
||||
|
||||
If you are new to the codebase, read in this order:
|
||||
### CLI Session
|
||||
|
||||
1. this page
|
||||
2. [Agent Loop Internals](./agent-loop.md)
|
||||
3. [Prompt Assembly](./prompt-assembly.md)
|
||||
4. [Provider Runtime Resolution](./provider-runtime.md)
|
||||
5. [Adding Providers](./adding-providers.md)
|
||||
6. [Tools Runtime](./tools-runtime.md)
|
||||
7. [Session Storage](./session-storage.md)
|
||||
8. [Gateway Internals](./gateway-internals.md)
|
||||
9. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
|
||||
10. [ACP Internals](./acp-internals.md)
|
||||
11. [Environments, Benchmarks & Data Generation](./environments.md)
|
||||
```text
|
||||
User input → HermesCLI.process_input()
|
||||
→ AIAgent.run_conversation()
|
||||
→ prompt_builder.build_system_prompt()
|
||||
→ runtime_provider.resolve_runtime_provider()
|
||||
→ API call (chat_completions / codex_responses / anthropic_messages)
|
||||
→ tool_calls? → model_tools.handle_function_call() → loop
|
||||
→ final response → display → save to SessionDB
|
||||
```
|
||||
|
||||
## Major subsystems
|
||||
### Gateway Message
|
||||
|
||||
### Agent loop
|
||||
```text
|
||||
Platform event → Adapter.on_message() → MessageEvent
|
||||
→ GatewayRunner._handle_message()
|
||||
→ authorize user
|
||||
→ resolve session key
|
||||
→ create AIAgent with session history
|
||||
→ AIAgent.run_conversation()
|
||||
→ deliver response back through adapter
|
||||
```
|
||||
|
||||
The core synchronous orchestration engine is `AIAgent` in `run_agent.py`.
|
||||
### Cron Job
|
||||
|
||||
It is responsible for:
|
||||
```text
|
||||
Scheduler tick → load due jobs from jobs.json
|
||||
→ create fresh AIAgent (no history)
|
||||
→ inject attached skills as context
|
||||
→ run job prompt
|
||||
→ deliver response to target platform
|
||||
→ update job state and next_run
|
||||
```
|
||||
|
||||
- provider/API-mode selection
|
||||
- prompt construction
|
||||
- tool execution
|
||||
- retries and fallback
|
||||
- callbacks
|
||||
- compression and persistence
|
||||
## Recommended Reading Order
|
||||
|
||||
See [Agent Loop Internals](./agent-loop.md).
|
||||
If you are new to the codebase:
|
||||
|
||||
### Prompt system
|
||||
1. **This page** — orient yourself
|
||||
2. **[Agent Loop Internals](./agent-loop.md)** — how AIAgent works
|
||||
3. **[Prompt Assembly](./prompt-assembly.md)** — system prompt construction
|
||||
4. **[Provider Runtime Resolution](./provider-runtime.md)** — how providers are selected
|
||||
5. **[Adding Providers](./adding-providers.md)** — practical guide to adding a new provider
|
||||
6. **[Tools Runtime](./tools-runtime.md)** — tool registry, dispatch, environments
|
||||
7. **[Session Storage](./session-storage.md)** — SQLite schema, FTS5, session lineage
|
||||
8. **[Gateway Internals](./gateway-internals.md)** — messaging platform gateway
|
||||
9. **[Context Compression & Prompt Caching](./context-compression-and-caching.md)** — compression and caching
|
||||
10. **[ACP Internals](./acp-internals.md)** — IDE integration
|
||||
11. **[Environments, Benchmarks & Data Generation](./environments.md)** — RL training
|
||||
|
||||
Prompt-building logic is split between:
|
||||
## Major Subsystems
|
||||
|
||||
- `run_agent.py`
|
||||
- `agent/prompt_builder.py`
|
||||
- `agent/prompt_caching.py`
|
||||
- `agent/context_compressor.py`
|
||||
### Agent Loop
|
||||
|
||||
See:
|
||||
The synchronous orchestration engine (`AIAgent` in `run_agent.py`). Handles provider selection, prompt construction, tool execution, retries, fallback, callbacks, compression, and persistence. Supports three API modes for different provider backends.
|
||||
|
||||
- [Prompt Assembly](./prompt-assembly.md)
|
||||
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
|
||||
→ [Agent Loop Internals](./agent-loop.md)
|
||||
|
||||
### Provider/runtime resolution
|
||||
### Prompt System
|
||||
|
||||
Hermes has a shared runtime provider resolver used by CLI, gateway, cron, ACP, and auxiliary calls.
|
||||
Prompt construction and maintenance across the conversation lifecycle:
|
||||
|
||||
See [Provider Runtime Resolution](./provider-runtime.md).
|
||||
- **`prompt_builder.py`** — Assembles the system prompt from: personality (SOUL.md), memory (MEMORY.md, USER.md), skills, context files (AGENTS.md, .hermes.md), tool-use guidance, and model-specific instructions
|
||||
- **`prompt_caching.py`** — Applies Anthropic cache breakpoints for prefix caching
|
||||
- **`context_compressor.py`** — Summarizes middle conversation turns when context exceeds thresholds
|
||||
|
||||
### Tooling runtime
|
||||
→ [Prompt Assembly](./prompt-assembly.md), [Context Compression & Prompt Caching](./context-compression-and-caching.md)
|
||||
|
||||
The tool registry, toolsets, terminal backends, process manager, and dispatch rules form a subsystem of their own.
|
||||
### Provider Resolution
|
||||
|
||||
See [Tools Runtime](./tools-runtime.md).
|
||||
A shared runtime resolver used by CLI, gateway, cron, ACP, and auxiliary calls. Maps `(provider, model)` tuples to `(api_mode, api_key, base_url)`. Handles 18+ providers, OAuth flows, credential pools, and alias resolution.
|
||||
|
||||
### Session persistence
|
||||
→ [Provider Runtime Resolution](./provider-runtime.md)
|
||||
|
||||
Historical session state is stored primarily in SQLite, with lineage preserved across compression splits.
|
||||
### Tool System
|
||||
|
||||
See [Session Storage](./session-storage.md).
|
||||
Central tool registry (`tools/registry.py`) with 47 registered tools across 20 toolsets. Each tool file self-registers at import time. The registry handles schema collection, dispatch, availability checking, and error wrapping. Terminal tools support 6 backends (local, Docker, SSH, Daytona, Modal, Singularity).
|
||||
|
||||
### Messaging gateway
|
||||
→ [Tools Runtime](./tools-runtime.md)
|
||||
|
||||
The gateway is a long-running orchestration layer for platform adapters, session routing, pairing, delivery, and cron ticking.
|
||||
### Session Persistence
|
||||
|
||||
See [Gateway Internals](./gateway-internals.md).
|
||||
SQLite-based session storage with FTS5 full-text search. Sessions have lineage tracking (parent/child across compressions), per-platform isolation, and atomic writes with contention handling.
|
||||
|
||||
### ACP integration
|
||||
→ [Session Storage](./session-storage.md)
|
||||
|
||||
ACP exposes Hermes as an editor-native agent over stdio/JSON-RPC.
|
||||
### Messaging Gateway
|
||||
|
||||
See:
|
||||
Long-running process with 14 platform adapters, unified session routing, user authorization (allowlists + DM pairing), slash command dispatch, hook system, cron ticking, and background maintenance.
|
||||
|
||||
- [ACP Editor Integration](../user-guide/features/acp.md)
|
||||
- [ACP Internals](./acp-internals.md)
|
||||
→ [Gateway Internals](./gateway-internals.md)
|
||||
|
||||
### Plugin System
|
||||
|
||||
Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Memory providers are a specialized plugin type under `plugins/memory/`.
|
||||
|
||||
→ [Plugin Guide](/docs/guides/build-a-hermes-plugin), [Memory Provider Plugin](./memory-provider-plugin.md)
|
||||
|
||||
### Cron
|
||||
|
||||
Cron jobs are implemented as first-class agent tasks, not just shell tasks.
|
||||
First-class agent tasks (not shell tasks). Jobs store in JSON, support multiple schedule formats, can attach skills and scripts, and deliver to any platform.
|
||||
|
||||
See [Cron Internals](./cron-internals.md).
|
||||
→ [Cron Internals](./cron-internals.md)
|
||||
|
||||
### RL / environments / trajectories
|
||||
### ACP Integration
|
||||
|
||||
Hermes ships a full environment framework for evaluation, RL integration, and SFT data generation.
|
||||
Exposes Hermes as an editor-native agent over stdio/JSON-RPC for VS Code, Zed, and JetBrains.
|
||||
|
||||
See:
|
||||
→ [ACP Internals](./acp-internals.md)
|
||||
|
||||
- [Environments, Benchmarks & Data Generation](./environments.md)
|
||||
- [Trajectories & Training Format](./trajectory-format.md)
|
||||
### RL / Environments / Trajectories
|
||||
|
||||
## Design themes
|
||||
Full environment framework for evaluation and RL training. Integrates with Atropos, supports multiple tool-call parsers, and generates ShareGPT-format trajectories.
|
||||
|
||||
Several cross-cutting design themes appear throughout the codebase:
|
||||
→ [Environments, Benchmarks & Data Generation](./environments.md), [Trajectories & Training Format](./trajectory-format.md)
|
||||
|
||||
- prompt stability matters
|
||||
- tool execution must be observable and interruptible
|
||||
- session persistence must survive long-running use
|
||||
- platform frontends should share one agent core
|
||||
- optional subsystems should remain loosely coupled where possible
|
||||
## Design Principles
|
||||
|
||||
## Implementation notes
|
||||
| Principle | What it means in practice |
|
||||
|-----------|--------------------------|
|
||||
| **Prompt stability** | System prompt doesn't change mid-conversation. No cache-breaking mutations except explicit user actions (`/model`). |
|
||||
| **Observable execution** | Every tool call is visible to the user via callbacks. Progress updates in CLI (spinner) and gateway (chat messages). |
|
||||
| **Interruptible** | API calls and tool execution can be cancelled mid-flight by user input or signals. |
|
||||
| **Platform-agnostic core** | One AIAgent class serves CLI, gateway, ACP, batch, and API server. Platform differences live in the entry point, not the agent. |
|
||||
| **Loose coupling** | Optional subsystems (MCP, plugins, memory providers, RL environments) use registry patterns and check_fn gating, not hard dependencies. |
|
||||
| **Profile isolation** | Each profile (`hermes -p <name>`) gets its own HERMES_HOME, config, memory, sessions, and gateway PID. Multiple profiles run concurrently. |
|
||||
|
||||
The older mental model of Hermes as “one OpenAI-compatible chat loop plus some tools” is no longer sufficient. Current Hermes includes:
|
||||
## File Dependency Chain
|
||||
|
||||
- multiple API modes
|
||||
- auxiliary model routing
|
||||
- ACP editor integration
|
||||
- gateway-specific session and delivery semantics
|
||||
- RL environment infrastructure
|
||||
- prompt-caching and compression logic with lineage-aware persistence
|
||||
```text
|
||||
tools/registry.py (no deps — imported by all tool files)
|
||||
↑
|
||||
tools/*.py (each calls registry.register() at import time)
|
||||
↑
|
||||
model_tools.py (imports tools/registry + triggers tool discovery)
|
||||
↑
|
||||
run_agent.py, cli.py, batch_runner.py, environments/
|
||||
```
|
||||
|
||||
Use this page as the map, then dive into subsystem-specific docs for the real implementation details.
|
||||
This chain means tool registration happens at import time, before any agent instance is created. Adding a new tool requires an import in `model_tools.py`'s `_discover_tools()` list.
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ Hermes Agent uses a dual compression system and Anthropic prompt caching to
|
|||
manage context window usage efficiently across long conversations.
|
||||
|
||||
Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`,
|
||||
`gateway/run.py` (session hygiene), `run_agent.py` (lines 1146-1204)
|
||||
`gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`)
|
||||
|
||||
|
||||
## Dual Compression System
|
||||
|
|
@ -26,7 +26,7 @@ Hermes has two separate compression layers that operate independently:
|
|||
|
||||
### 1. Gateway Session Hygiene (85% threshold)
|
||||
|
||||
Located in `gateway/run.py` (around line 2220). This is a **safety net** that
|
||||
Located in `gateway/run.py` (search for `_maybe_compress_session`). This is a **safety net** that
|
||||
runs before the agent processes a message. It prevents API failures when sessions
|
||||
grow too large between turns (e.g., overnight accumulation in Telegram/Discord).
|
||||
|
||||
|
|
|
|||
|
|
@ -6,85 +6,195 @@ description: "How Hermes stores, schedules, edits, pauses, skill-loads, and deli
|
|||
|
||||
# Cron Internals
|
||||
|
||||
Hermes cron support is implemented primarily in:
|
||||
The cron subsystem provides scheduled task execution — from simple one-shot delays to recurring cron-expression jobs with skill injection and cross-platform delivery.
|
||||
|
||||
- `cron/jobs.py`
|
||||
- `cron/scheduler.py`
|
||||
- `tools/cronjob_tools.py`
|
||||
- `gateway/run.py`
|
||||
- `hermes_cli/cron.py`
|
||||
## Key Files
|
||||
|
||||
## Scheduling model
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `cron/jobs.py` | Job model, storage, atomic read/write to `jobs.json` |
|
||||
| `cron/scheduler.py` | Scheduler loop — due-job detection, execution, repeat tracking |
|
||||
| `tools/cronjob_tools.py` | Model-facing `cronjob` tool registration and handler |
|
||||
| `gateway/run.py` | Gateway integration — cron ticking in the long-running loop |
|
||||
| `hermes_cli/cron.py` | CLI `hermes cron` subcommands |
|
||||
|
||||
Hermes supports:
|
||||
## Scheduling Model
|
||||
|
||||
- one-shot delays
|
||||
- intervals
|
||||
- cron expressions
|
||||
- explicit timestamps
|
||||
Four schedule formats are supported:
|
||||
|
||||
The model-facing surface is a single `cronjob` tool with action-style operations:
|
||||
| Format | Example | Behavior |
|
||||
|--------|---------|----------|
|
||||
| **Relative delay** | `30m`, `2h`, `1d` | One-shot, fires after the specified duration |
|
||||
| **Interval** | `every 2h`, `every 30m` | Recurring, fires at regular intervals |
|
||||
| **Cron expression** | `0 9 * * *` | Standard 5-field cron syntax (minute, hour, day, month, weekday) |
|
||||
| **ISO timestamp** | `2025-01-15T09:00:00` | One-shot, fires at the exact time |
|
||||
|
||||
- `create`
|
||||
- `list`
|
||||
- `update`
|
||||
- `pause`
|
||||
- `resume`
|
||||
- `run`
|
||||
- `remove`
|
||||
The model-facing surface is a single `cronjob` tool with action-style operations: `create`, `list`, `update`, `pause`, `resume`, `run`, `remove`.
|
||||
|
||||
## Job storage
|
||||
## Job Storage
|
||||
|
||||
Cron jobs are stored in Hermes-managed local state (`~/.hermes/cron/jobs.json`) with atomic write semantics.
|
||||
Jobs are stored in `~/.hermes/cron/jobs.json` with atomic write semantics (write to temp file, then rename). Each job record contains:
|
||||
|
||||
Each job can carry:
|
||||
```json
|
||||
{
|
||||
"id": "job_abc123",
|
||||
"name": "Daily briefing",
|
||||
"prompt": "Summarize today's AI news and funding rounds",
|
||||
"schedule": "0 9 * * *",
|
||||
"skills": ["ai-funding-daily-report"],
|
||||
"deliver": "telegram:-1001234567890",
|
||||
"repeat": null,
|
||||
"state": "scheduled",
|
||||
"next_run": "2025-01-16T09:00:00Z",
|
||||
"run_count": 42,
|
||||
"created_at": "2025-01-01T00:00:00Z",
|
||||
"model": null,
|
||||
"provider": null,
|
||||
"script": null
|
||||
}
|
||||
```
|
||||
|
||||
- prompt
|
||||
- schedule metadata
|
||||
- repeat counters
|
||||
- delivery target
|
||||
- lifecycle state (`scheduled`, `paused`, `completed`, etc.)
|
||||
- zero, one, or multiple attached skills
|
||||
### Job Lifecycle States
|
||||
|
||||
Backward compatibility is preserved for older jobs that only stored a legacy single `skill` field or none of the newer lifecycle fields.
|
||||
| State | Meaning |
|
||||
|-------|---------|
|
||||
| `scheduled` | Active, will fire at next scheduled time |
|
||||
| `paused` | Suspended — won't fire until resumed |
|
||||
| `completed` | Repeat count exhausted or one-shot that has fired |
|
||||
| `running` | Currently executing (transient state) |
|
||||
|
||||
## Runtime behavior
|
||||
### Backward Compatibility
|
||||
|
||||
The scheduler:
|
||||
Older jobs may have a single `skill` field instead of the `skills` array. The scheduler normalizes this at load time — single `skill` is promoted to `skills: [skill]`.
|
||||
|
||||
- loads jobs
|
||||
- computes due work
|
||||
- executes jobs in fresh agent sessions
|
||||
- optionally injects one or more skills before the prompt
|
||||
- handles repeat counters
|
||||
- updates next-run metadata and state
|
||||
## Scheduler Runtime
|
||||
|
||||
In gateway mode, cron ticking is integrated into the long-running gateway loop.
|
||||
### Tick Cycle
|
||||
|
||||
## Skill-backed jobs
|
||||
The scheduler runs on a periodic tick (default: every 60 seconds):
|
||||
|
||||
A cron job may attach multiple skills. At runtime, Hermes loads those skills in order and then appends the job prompt as the task instruction.
|
||||
```text
|
||||
tick()
|
||||
1. Acquire scheduler lock (prevents overlapping ticks)
|
||||
2. Load all jobs from jobs.json
|
||||
3. Filter to due jobs (next_run <= now AND state == "scheduled")
|
||||
4. For each due job:
|
||||
a. Set state to "running"
|
||||
b. Create fresh AIAgent session (no conversation history)
|
||||
c. Load attached skills in order (injected as user messages)
|
||||
d. Run the job prompt through the agent
|
||||
e. Deliver the response to the configured target
|
||||
f. Update run_count, compute next_run
|
||||
g. If repeat count exhausted → state = "completed"
|
||||
h. Otherwise → state = "scheduled"
|
||||
5. Write updated jobs back to jobs.json
|
||||
6. Release scheduler lock
|
||||
```
|
||||
|
||||
This gives scheduled jobs reusable guidance without requiring the user to paste full skill bodies into the cron prompt.
|
||||
### Gateway Integration
|
||||
|
||||
## Recursion guard
|
||||
In gateway mode, the scheduler tick is integrated into the gateway's main event loop. The gateway calls `scheduler.tick()` on its periodic maintenance cycle, which runs alongside message handling.
|
||||
|
||||
Cron-run sessions disable the `cronjob` toolset. This prevents a scheduled job from recursively creating or mutating more cron jobs and accidentally exploding token usage or scheduler load.
|
||||
In CLI mode, cron jobs only fire when `hermes cron` commands are run or during active CLI sessions.
|
||||
|
||||
## Delivery model
|
||||
### Fresh Session Isolation
|
||||
|
||||
Cron jobs can deliver to:
|
||||
Each cron job runs in a completely fresh agent session:
|
||||
|
||||
- origin chat
|
||||
- local files
|
||||
- platform home channels
|
||||
- explicit platform/chat IDs
|
||||
- No conversation history from previous runs
|
||||
- No memory of previous cron executions (unless persisted to memory/files)
|
||||
- The prompt must be self-contained — cron jobs cannot ask clarifying questions
|
||||
- The `cronjob` toolset is disabled (recursion guard)
|
||||
|
||||
## Skill-Backed Jobs
|
||||
|
||||
A cron job can attach one or more skills via the `skills` field. At execution time:
|
||||
|
||||
1. Skills are loaded in the specified order
|
||||
2. Each skill's SKILL.md content is injected as context
|
||||
3. The job's prompt is appended as the task instruction
|
||||
4. The agent processes the combined skill context + prompt
|
||||
|
||||
This enables reusable, tested workflows without pasting full instructions into cron prompts. For example:
|
||||
|
||||
```
|
||||
Create a daily funding report → attach "ai-funding-daily-report" skill
|
||||
```
|
||||
|
||||
### Script-Backed Jobs
|
||||
|
||||
Jobs can also attach a Python script via the `script` field. The script runs *before* each agent turn, and its stdout is injected into the prompt as context. This enables data collection and change detection patterns:
|
||||
|
||||
```python
|
||||
# ~/.hermes/scripts/check_competitors.py
|
||||
import requests, json
|
||||
# Fetch competitor release notes, diff against last run
|
||||
# Print summary to stdout — agent analyzes and reports
|
||||
```
|
||||
|
||||
## Delivery Model
|
||||
|
||||
Cron job results can be delivered to any supported platform:
|
||||
|
||||
| Target | Syntax | Example |
|
||||
|--------|--------|---------|
|
||||
| Origin chat | `origin` | Deliver to the chat where the job was created |
|
||||
| Local file | `local` | Save to `~/.hermes/cron/output/` |
|
||||
| Telegram | `telegram` or `telegram:<chat_id>` | `telegram:-1001234567890` |
|
||||
| Discord | `discord` or `discord:#channel` | `discord:#engineering` |
|
||||
| Slack | `slack` | Deliver to Slack home channel |
|
||||
| WhatsApp | `whatsapp` | Deliver to WhatsApp home |
|
||||
| Signal | `signal` | Deliver to Signal |
|
||||
| Matrix | `matrix` | Deliver to Matrix home room |
|
||||
| Mattermost | `mattermost` | Deliver to Mattermost home |
|
||||
| Email | `email` | Deliver via email |
|
||||
| SMS | `sms` | Deliver via SMS |
|
||||
| Home Assistant | `homeassistant` | Deliver to HA conversation |
|
||||
| DingTalk | `dingtalk` | Deliver to DingTalk |
|
||||
| Feishu | `feishu` | Deliver to Feishu |
|
||||
| WeCom | `wecom` | Deliver to WeCom |
|
||||
|
||||
For Telegram topics, use the format `telegram:<chat_id>:<thread_id>` (e.g., `telegram:-1001234567890:17585`).
|
||||
|
||||
### Response Wrapping
|
||||
|
||||
By default (`cron.wrap_response: true`), cron deliveries are wrapped with:
|
||||
- A header identifying the cron job name and task
|
||||
- A footer noting the agent cannot see the delivered message in conversation
|
||||
|
||||
The `[SILENT]` prefix in a cron response suppresses delivery entirely — useful for jobs that only need to write to files or perform side effects.
|
||||
|
||||
### Session Isolation
|
||||
|
||||
Cron deliveries are NOT mirrored into gateway session conversation history. They exist only in the cron job's own session. This prevents message alternation violations in the target chat's conversation.
|
||||
|
||||
## Recursion Guard
|
||||
|
||||
Cron-run sessions have the `cronjob` toolset disabled. This prevents:
|
||||
- A scheduled job from creating new cron jobs
|
||||
- Recursive scheduling that could explode token usage
|
||||
- Accidental mutation of the job schedule from within a job
|
||||
|
||||
## Locking
|
||||
|
||||
Hermes uses lock-based protections so overlapping scheduler ticks do not execute the same due-job batch twice.
|
||||
The scheduler uses file-based locking to prevent overlapping ticks from executing the same due-job batch twice. This is important in gateway mode where multiple maintenance cycles could overlap if a previous tick takes longer than the tick interval.
|
||||
|
||||
## Related docs
|
||||
## CLI Interface
|
||||
|
||||
- [Cron feature guide](../user-guide/features/cron.md)
|
||||
The `hermes cron` CLI provides direct job management:
|
||||
|
||||
```bash
|
||||
hermes cron list # Show all jobs
|
||||
hermes cron add # Interactive job creation
|
||||
hermes cron edit <job_id> # Edit job configuration
|
||||
hermes cron pause <job_id> # Pause a running job
|
||||
hermes cron resume <job_id> # Resume a paused job
|
||||
hermes cron run <job_id> # Trigger immediate execution
|
||||
hermes cron remove <job_id> # Delete a job
|
||||
```
|
||||
|
||||
## Related Docs
|
||||
|
||||
- [Cron Feature Guide](/docs/user-guide/features/cron)
|
||||
- [Gateway Internals](./gateway-internals.md)
|
||||
- [Agent Loop Internals](./agent-loop.md)
|
||||
|
|
|
|||
|
|
@ -6,106 +6,248 @@ description: "How the messaging gateway boots, authorizes users, routes sessions
|
|||
|
||||
# Gateway Internals
|
||||
|
||||
The messaging gateway is the long-running process that connects Hermes to external platforms.
|
||||
The messaging gateway is the long-running process that connects Hermes to 14+ external messaging platforms through a unified architecture.
|
||||
|
||||
Key files:
|
||||
## Key Files
|
||||
|
||||
- `gateway/run.py`
|
||||
- `gateway/config.py`
|
||||
- `gateway/session.py`
|
||||
- `gateway/delivery.py`
|
||||
- `gateway/pairing.py`
|
||||
- `gateway/channel_directory.py`
|
||||
- `gateway/hooks.py`
|
||||
- `gateway/mirror.py`
|
||||
- `gateway/platforms/*`
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `gateway/run.py` | `GatewayRunner` — main loop, slash commands, message dispatch (~7,200 lines) |
|
||||
| `gateway/session.py` | `SessionStore` — conversation persistence and session key construction |
|
||||
| `gateway/delivery.py` | Outbound message delivery to target platforms/channels |
|
||||
| `gateway/pairing.py` | DM pairing flow for user authorization |
|
||||
| `gateway/channel_directory.py` | Maps chat IDs to human-readable names for cron delivery |
|
||||
| `gateway/hooks.py` | Hook discovery, loading, and lifecycle event dispatch |
|
||||
| `gateway/mirror.py` | Cross-session message mirroring for `send_message` |
|
||||
| `gateway/status.py` | Token lock management for profile-scoped gateway instances |
|
||||
| `gateway/builtin_hooks/` | Always-registered hooks (e.g., BOOT.md system prompt hook) |
|
||||
| `gateway/platforms/` | Platform adapters (one per messaging platform) |
|
||||
|
||||
## Core responsibilities
|
||||
## Architecture Overview
|
||||
|
||||
The gateway process is responsible for:
|
||||
```text
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ GatewayRunner │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Telegram │ │ Discord │ │ Slack │ ... │
|
||||
│ │ Adapter │ │ Adapter │ │ Adapter │ │
|
||||
│ └─────┬─────┘ └─────┬────┘ └─────┬────┘ │
|
||||
│ │ │ │ │
|
||||
│ └──────────────┼──────────────┘ │
|
||||
│ ▼ │
|
||||
│ _handle_message() │
|
||||
│ │ │
|
||||
│ ┌────────────┼────────────┐ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ Slash command AIAgent Queue/BG │
|
||||
│ dispatch creation sessions │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ SessionStore │
|
||||
│ (SQLite persistence) │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- loading configuration from `.env`, `config.yaml`, and `gateway.json`
|
||||
- starting platform adapters
|
||||
- authorizing users
|
||||
- routing incoming events to sessions
|
||||
- maintaining per-chat session continuity
|
||||
- dispatching messages to `AIAgent`
|
||||
- running cron ticks and background maintenance tasks
|
||||
- mirroring/proactively delivering output to configured channels
|
||||
## Message Flow
|
||||
|
||||
## Config sources
|
||||
When a message arrives from any platform:
|
||||
|
||||
The gateway has a multi-source config model:
|
||||
1. **Platform adapter** receives raw event, normalizes it into a `MessageEvent`
|
||||
2. **Base adapter** checks active session guard:
|
||||
- If agent is running for this session → queue message, set interrupt event
|
||||
- If `/approve`, `/deny`, `/stop` → bypass guard (dispatched inline)
|
||||
3. **GatewayRunner._handle_message()** receives the event:
|
||||
- Resolve session key via `_session_key_for_source()` (format: `agent:main:{platform}:{chat_type}:{chat_id}`)
|
||||
- Check authorization (see Authorization below)
|
||||
- Check if it's a slash command → dispatch to command handler
|
||||
- Check if agent is already running → intercept commands like `/stop`, `/status`
|
||||
- Otherwise → create `AIAgent` instance and run conversation
|
||||
4. **Response** is sent back through the platform adapter
|
||||
|
||||
- environment variables
|
||||
- `~/.hermes/gateway.json`
|
||||
- selected bridged values from `~/.hermes/config.yaml`
|
||||
### Session Key Format
|
||||
|
||||
## Session routing
|
||||
Session keys encode the full routing context:
|
||||
|
||||
`gateway/session.py` and `GatewayRunner` cooperate to map incoming messages to active session IDs.
|
||||
```
|
||||
agent:main:{platform}:{chat_type}:{chat_id}
|
||||
```
|
||||
|
||||
Session keying can depend on:
|
||||
For example: `agent:main:telegram:private:123456789`
|
||||
|
||||
- platform
|
||||
- user/chat identity
|
||||
- thread/topic identity
|
||||
- special platform-specific routing behavior
|
||||
Thread-aware platforms (Telegram forum topics, Discord threads, Slack threads) may include thread IDs in the chat_id portion. **Never construct session keys manually** — always use `build_session_key()` from `gateway/session.py`.
|
||||
|
||||
## Authorization layers
|
||||
### Two-Level Message Guard
|
||||
|
||||
The gateway can authorize through:
|
||||
When an agent is actively running, incoming messages pass through two sequential guards:
|
||||
|
||||
- platform allowlists
|
||||
- gateway-wide allowlists
|
||||
- DM pairing flows
|
||||
- explicit allow-all settings
|
||||
1. **Level 1 — Base adapter** (`gateway/platforms/base.py`): Checks `_active_sessions`. If the session is active, queues the message in `_pending_messages` and sets an interrupt event. This catches messages *before* they reach the gateway runner.
|
||||
|
||||
Pairing support is implemented in `gateway/pairing.py`.
|
||||
2. **Level 2 — Gateway runner** (`gateway/run.py`): Checks `_running_agents`. Intercepts specific commands (`/stop`, `/new`, `/queue`, `/status`, `/approve`, `/deny`) and routes them appropriately. Everything else triggers `running_agent.interrupt()`.
|
||||
|
||||
## Delivery path
|
||||
Commands that must reach the runner while the agent is blocked (like `/approve`) are dispatched **inline** via `await self._message_handler(event)` — they bypass the background task system to avoid race conditions.
|
||||
|
||||
Outgoing deliveries are handled by `gateway/delivery.py`, which knows how to:
|
||||
## Authorization
|
||||
|
||||
- deliver to a home channel
|
||||
- resolve explicit targets
|
||||
- mirror some remote deliveries back into local history/session tracking
|
||||
The gateway uses a multi-layer authorization check, evaluated in order:
|
||||
|
||||
1. **Gateway-wide allow-all** (`GATEWAY_ALLOW_ALL_USERS`) — if set, all users are authorized
|
||||
2. **Platform allowlist** (e.g., `TELEGRAM_ALLOWED_USERS`) — comma-separated user IDs
|
||||
3. **DM pairing** — authenticated users can pair new users via a pairing code
|
||||
4. **Admin escalation** — some commands require admin status beyond basic authorization
|
||||
|
||||
### DM Pairing Flow
|
||||
|
||||
```text
|
||||
Admin: /pair
|
||||
Gateway: "Pairing code: ABC123. Share with the user."
|
||||
New user: ABC123
|
||||
Gateway: "Paired! You're now authorized."
|
||||
```
|
||||
|
||||
Pairing state is persisted in `gateway/pairing.py` and survives restarts.
|
||||
|
||||
## Slash Command Dispatch
|
||||
|
||||
All slash commands in the gateway flow through the same resolution pipeline:
|
||||
|
||||
1. `resolve_command()` from `hermes_cli/commands.py` maps input to canonical name (handles aliases, prefix matching)
|
||||
2. The canonical name is checked against `GATEWAY_KNOWN_COMMANDS`
|
||||
3. Handler in `_handle_message()` dispatches based on canonical name
|
||||
4. Some commands are gated on config (`gateway_config_gate` on `CommandDef`)
|
||||
|
||||
### Running-Agent Guard
|
||||
|
||||
Commands that must NOT execute while the agent is processing are rejected early:
|
||||
|
||||
```python
|
||||
if _quick_key in self._running_agents:
|
||||
if canonical == "model":
|
||||
return "⏳ Agent is running — wait for it to finish or /stop first."
|
||||
```
|
||||
|
||||
Bypass commands (`/stop`, `/new`, `/approve`, `/deny`, `/queue`, `/status`) have special handling.
|
||||
|
||||
## Config Sources
|
||||
|
||||
The gateway reads configuration from multiple sources:
|
||||
|
||||
| Source | What it provides |
|
||||
|--------|-----------------|
|
||||
| `~/.hermes/.env` | API keys, bot tokens, platform credentials |
|
||||
| `~/.hermes/config.yaml` | Model settings, tool configuration, display options |
|
||||
| Environment variables | Override any of the above |
|
||||
|
||||
Unlike the CLI (which uses `load_cli_config()` with hardcoded defaults), the gateway reads `config.yaml` directly via YAML loader. This means config keys that exist in the CLI's defaults dict but not in the user's config file may behave differently between CLI and gateway.
|
||||
|
||||
## Platform Adapters
|
||||
|
||||
Each messaging platform has an adapter in `gateway/platforms/`:
|
||||
|
||||
```text
|
||||
gateway/platforms/
|
||||
├── base.py # BaseAdapter — shared logic for all platforms
|
||||
├── telegram.py # Telegram Bot API (long polling or webhook)
|
||||
├── discord.py # Discord bot via discord.py
|
||||
├── slack.py # Slack Socket Mode
|
||||
├── whatsapp.py # WhatsApp Business Cloud API
|
||||
├── signal.py # Signal via signal-cli REST API
|
||||
├── matrix.py # Matrix via matrix-nio (optional E2EE)
|
||||
├── mattermost.py # Mattermost WebSocket API
|
||||
├── email_adapter.py # Email via IMAP/SMTP
|
||||
├── sms.py # SMS via Twilio
|
||||
├── dingtalk.py # DingTalk WebSocket
|
||||
├── feishu.py # Feishu/Lark WebSocket or webhook
|
||||
├── wecom.py # WeCom (WeChat Work) callback
|
||||
└── homeassistant.py # Home Assistant conversation integration
|
||||
```
|
||||
|
||||
Adapters implement a common interface:
|
||||
- `connect()` / `disconnect()` — lifecycle management
|
||||
- `send_message()` — outbound message delivery
|
||||
- `on_message()` — inbound message normalization → `MessageEvent`
|
||||
|
||||
### Token Locks
|
||||
|
||||
Adapters that connect with unique credentials call `acquire_scoped_lock()` in `connect()` and `release_scoped_lock()` in `disconnect()`. This prevents two profiles from using the same bot token simultaneously.
|
||||
|
||||
## Delivery Path
|
||||
|
||||
Outgoing deliveries (`gateway/delivery.py`) handle:
|
||||
|
||||
- **Direct reply** — send response back to the originating chat
|
||||
- **Home channel delivery** — route cron job outputs and background results to a configured home channel
|
||||
- **Explicit target delivery** — `send_message` tool specifying `telegram:-1001234567890`
|
||||
- **Cross-platform delivery** — deliver to a different platform than the originating message
|
||||
|
||||
Cron job deliveries are NOT mirrored into gateway session history — they live in their own cron session only. This is a deliberate design choice to avoid message alternation violations.
|
||||
|
||||
## Hooks
|
||||
|
||||
Gateway events emit hook callbacks through `gateway/hooks.py`. Hooks are local trusted Python code and can observe or extend gateway lifecycle events.
|
||||
Gateway hooks are Python modules that respond to lifecycle events:
|
||||
|
||||
## Background maintenance
|
||||
### Gateway Hook Events
|
||||
|
||||
The gateway also runs maintenance tasks such as:
|
||||
| Event | When fired |
|
||||
|-------|-----------|
|
||||
| `gateway:startup` | Gateway process starts |
|
||||
| `session:start` | New conversation session begins |
|
||||
| `session:end` | Session completes or times out |
|
||||
| `session:reset` | User resets session with `/new` |
|
||||
| `agent:start` | Agent begins processing a message |
|
||||
| `agent:step` | Agent completes one tool-calling iteration |
|
||||
| `agent:end` | Agent finishes and returns response |
|
||||
| `command:*` | Any slash command is executed |
|
||||
|
||||
- cron ticking
|
||||
- cache refreshes
|
||||
- session expiry checks
|
||||
- proactive memory flush before reset/expiry
|
||||
Hooks are discovered from `gateway/builtin_hooks/` (always active) and `~/.hermes/hooks/` (user-installed). Each hook is a directory with a `HOOK.yaml` manifest and `handler.py`.
|
||||
|
||||
## Honcho interaction
|
||||
## Memory Provider Integration
|
||||
|
||||
When a memory provider plugin (e.g. Honcho) is enabled, the gateway creates an AIAgent per incoming message with the same session ID. The memory provider's `initialize()` receives the session ID and creates the appropriate backend session. Tools are routed through the `MemoryManager`, which handles all provider lifecycle hooks (prefetch, sync, session end).
|
||||
When a memory provider plugin (e.g., Honcho) is enabled:
|
||||
|
||||
### Memory provider session routing
|
||||
1. Gateway creates an `AIAgent` per message with the session ID
|
||||
2. The `MemoryManager` initializes the provider with the session context
|
||||
3. Provider tools (e.g., `honcho_profile`, `viking_search`) are routed through:
|
||||
|
||||
Memory provider tools (e.g. `honcho_profile`, `viking_search`) are routed through the MemoryManager in `_invoke_tool()`:
|
||||
|
||||
```
|
||||
```text
|
||||
AIAgent._invoke_tool()
|
||||
→ self._memory_manager.handle_tool_call(name, args)
|
||||
→ provider.handle_tool_call(name, args)
|
||||
```
|
||||
|
||||
Each memory provider manages its own session lifecycle internally. The `initialize()` method receives the session ID, and `on_session_end()` handles cleanup and final flush.
|
||||
4. On session end/reset, `on_session_end()` fires for cleanup and final data flush
|
||||
|
||||
### Memory flush lifecycle
|
||||
### Memory Flush Lifecycle
|
||||
|
||||
When a session is reset, resumed, or expires, the gateway flushes built-in memories before discarding context. The flush creates a temporary `AIAgent` that runs a memory-only conversation turn. The memory provider's `on_session_end()` hook fires during this process, giving external providers a chance to persist any buffered data.
|
||||
When a session is reset, resumed, or expires:
|
||||
1. Built-in memories are flushed to disk
|
||||
2. Memory provider's `on_session_end()` hook fires
|
||||
3. A temporary `AIAgent` runs a memory-only conversation turn
|
||||
4. Context is then discarded or archived
|
||||
|
||||
## Related docs
|
||||
## Background Maintenance
|
||||
|
||||
The gateway runs periodic maintenance alongside message handling:
|
||||
|
||||
- **Cron ticking** — checks job schedules and fires due jobs
|
||||
- **Session expiry** — cleans up abandoned sessions after timeout
|
||||
- **Memory flush** — proactively flushes memory before session expiry
|
||||
- **Cache refresh** — refreshes model lists and provider status
|
||||
|
||||
## Process Management
|
||||
|
||||
The gateway runs as a long-lived process, managed via:
|
||||
|
||||
- `hermes gateway start` / `hermes gateway stop` — manual control
|
||||
- `systemctl` (Linux) or `launchctl` (macOS) — service management
|
||||
- PID file at `~/.hermes/gateway.pid` — profile-scoped process tracking
|
||||
|
||||
**Profile-scoped vs global**: `start_gateway()` uses profile-scoped PID files. `hermes gateway stop` stops only the current profile's gateway. `hermes gateway stop --all` uses global `ps aux` scanning to kill all gateway processes (used during updates).
|
||||
|
||||
## Related Docs
|
||||
|
||||
- [Session Storage](./session-storage.md)
|
||||
- [Cron Internals](./cron-internals.md)
|
||||
- [ACP Internals](./acp-internals.md)
|
||||
- [Agent Loop Internals](./agent-loop.md)
|
||||
- [Messaging Gateway (User Guide)](/docs/user-guide/messaging)
|
||||
|
|
|
|||
|
|
@ -3,7 +3,7 @@
|
|||
Hermes Agent saves conversation trajectories in ShareGPT-compatible JSONL format
|
||||
for use as training data, debugging artifacts, and reinforcement learning datasets.
|
||||
|
||||
Source files: `agent/trajectory.py`, `run_agent.py` (lines 1788-1975), `batch_runner.py`
|
||||
Source files: `agent/trajectory.py`, `run_agent.py` (search for `_save_trajectory`), `batch_runner.py`
|
||||
|
||||
|
||||
## File Naming Convention
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue