docs: comprehensive documentation audit — fix stale info, expand thin pages, add depth (#5393)

Major changes across 20 documentation pages: Staleness fixes: - Fix FAQ: wrong import path (hermes.agent → run_agent) - Fix FAQ: stale Gemini 2.0 model → Gemini 3 Flash - Fix integrations/index: missing MiniMax TTS provider - Fix integrations/index: web_crawl is not a registered tool - Fix sessions: add all 19 session sources (was only 5) - Fix cron: add all 18 delivery targets (was only telegram/discord) - Fix webhooks: add all delivery targets - Fix overview: add missing MCP, memory providers, credential pools - Fix all line-number references → use function name searches instead - Update file size estimates (run_agent ~9200, gateway ~7200, cli ~8500) Expanded thin pages (< 150 lines → substantial depth): - honcho.md: 43 → 108 lines — added feature comparison, tools, config, CLI - overview.md: 49 → 55 lines — added MCP, memory providers, credential pools - toolsets-reference.md: 57 → 175 lines — added explanations, config examples, custom toolsets, wildcards, platform differences table - optional-skills-catalog.md: 74 → 153 lines — added 25+ missing skills across communication, devops, mlops (18!), productivity, research categories - integrations/index.md: 82 → 115 lines — added messaging, HA, plugins sections - cron-internals.md: 90 → 195 lines — added job JSON example, lifecycle states, tick cycle, delivery targets, script-backed jobs, CLI interface - gateway-internals.md: 111 → 250 lines — added architecture diagram, message flow, two-level guard, platform adapters, token locks, process management - agent-loop.md: 112 → 235 lines — added entry points, API mode resolution, turn lifecycle detail, message alternation rules, tool execution flow, callback table, budget tracking, compression details - architecture.md: 152 → 295 lines — added system overview diagram, data flow diagrams, design principles table, dependency chain Other depth additions: - context-references.md: added platform availability, compression interaction, common patterns sections - slash-commands.md: added quick commands config example, alias resolution - image-generation.md: added platform delivery table - tools-reference.md: added tool counts, MCP tools note - index.md: updated platform count (5 → 14+), tool count (40+ → 47)
2026-04-26 01:01:40 +00:00 · 2026-04-05 19:45:50 -07:00 · 2026-04-05 19:45:50 -07:00 · 43d468cea8
commit 43d468cea8
parent fec58ad99e
20 changed files with 1243 additions and 406 deletions
--- a/website/docs/developer-guide/agent-loop.md
+++ b/website/docs/developer-guide/agent-loop.md
@ -6,107 +6,231 @@ description: "Detailed walkthrough of AIAgent execution, API modes, tools, callb

 # Agent Loop Internals

-The core orchestration engine is `run_agent.py`'s `AIAgent`.
+The core orchestration engine is `run_agent.py`'s `AIAgent` class — roughly 9,200 lines that handle everything from prompt assembly to tool dispatch to provider failover.

-## Core responsibilities
+## Core Responsibilities

 `AIAgent` is responsible for:

- assembling the effective prompt and tool schemas
- selecting the correct provider/API mode
- making interruptible model calls
- executing tool calls (sequentially or concurrently)
- maintaining session history
- handling compression, retries, and fallback models
+- Assembling the effective system prompt and tool schemas via `prompt_builder.py`
+- Selecting the correct provider/API mode (chat_completions, codex_responses, anthropic_messages)
+- Making interruptible model calls with cancellation support
+- Executing tool calls (sequentially or concurrently via thread pool)
+- Maintaining conversation history in OpenAI message format
+- Handling compression, retries, and fallback model switching
+- Tracking iteration budgets across parent and child agents
+- Flushing persistent memory before context is lost

-## API modes
+## Two Entry Points

-Hermes currently supports three API execution modes:
+```python
+# Simple interface — returns final response string
+response = agent.chat("Fix the bug in main.py")

-| API mode | Used for |
-|----------|----------|
-| `chat_completions` | OpenAI-compatible chat endpoints, including OpenRouter and most custom endpoints |
-| `codex_responses` | OpenAI Codex / Responses API path |
-| `anthropic_messages` | Native Anthropic Messages API |
+# Full interface — returns dict with messages, metadata, usage stats
+result = agent.run_conversation(
+    user_message="Fix the bug in main.py",
+    system_message=None,           # auto-built if omitted
+    conversation_history=None,      # auto-loaded from session if omitted
+    task_id="task_abc123"
+)
+```

-The mode is resolved from explicit args, provider selection, and base URL heuristics.
+`chat()` is a thin wrapper around `run_conversation()` that extracts the `final_response` field from the result dict.

-## Turn lifecycle
+## API Modes
+
+Hermes supports three API execution modes, resolved from provider selection, explicit args, and base URL heuristics:
+
+| API mode | Used for | Client type |
+|----------|----------|-------------|
+| `chat_completions` | OpenAI-compatible endpoints (OpenRouter, custom, most providers) | `openai.OpenAI` |
+| `codex_responses` | OpenAI Codex / Responses API | `openai.OpenAI` with Responses format |
+| `anthropic_messages` | Native Anthropic Messages API | `anthropic.Anthropic` via adapter |
+
+The mode determines how messages are formatted, how tool calls are structured, how responses are parsed, and how caching/streaming works. All three converge on the same internal message format (OpenAI-style `role`/`content`/`tool_calls` dicts) before and after API calls.
+
+**Mode resolution order:**
+1. Explicit `api_mode` constructor arg (highest priority)
+2. Provider-specific detection (e.g., `anthropic` provider → `anthropic_messages`)
+3. Base URL heuristics (e.g., `api.anthropic.com` → `anthropic_messages`)
+4. Default: `chat_completions`
+
+## Turn Lifecycle
+
+Each iteration of the agent loop follows this sequence:

 ```text
 run_conversation()
-  -> generate effective task_id
-  -> append current user message
-  -> load or build cached system prompt
-  -> maybe preflight-compress
-  -> build api_messages
-  -> inject ephemeral prompt layers
-  -> apply prompt caching if appropriate
-  -> make interruptible API call
-  -> if tool calls: execute them, append tool results, loop
-  -> if final text: persist, cleanup, return response
+  1. Generate task_id if not provided
+  2. Append user message to conversation history
+  3. Build or reuse cached system prompt (prompt_builder.py)
+  4. Check if preflight compression is needed (>50% context)
+  5. Build API messages from conversation history
+     - chat_completions: OpenAI format as-is
+     - codex_responses: convert to Responses API input items
+     - anthropic_messages: convert via anthropic_adapter.py
+  6. Inject ephemeral prompt layers (budget warnings, context pressure)
+  7. Apply prompt caching markers if on Anthropic
+  8. Make interruptible API call (_api_call_with_interrupt)
+  9. Parse response:
+     - If tool_calls: execute them, append results, loop back to step 5
+     - If text response: persist session, flush memory if needed, return
 ```

-## Interruptible API calls
+### Message Format

-Hermes wraps API requests so they can be interrupted from the CLI or gateway.
+All messages use OpenAI-compatible format internally:

-This matters because:
+```python
+{"role": "system", "content": "..."}
+{"role": "user", "content": "..."}
+{"role": "assistant", "content": "...", "tool_calls": [...]}
+{"role": "tool", "tool_call_id": "...", "content": "..."}
+```

- the agent may be in a long LLM call
- the user may send a new message mid-flight
- background systems may need cancellation semantics
+Reasoning content (from models that support extended thinking) is stored in `assistant_msg["reasoning"]` and optionally displayed via the `reasoning_callback`.

-## Tool execution modes
+### Message Alternation Rules

-Hermes uses two execution strategies:
+The agent loop enforces strict message role alternation:

- sequential execution for single or interactive tools
- concurrent execution for multiple non-interactive tools
+- After the system message: `User → Assistant → User → Assistant → ...`
+- During tool calling: `Assistant (with tool_calls) → Tool → Tool → ... → Assistant`
+- **Never** two assistant messages in a row
+- **Never** two user messages in a row
+- **Only** `tool` role can have consecutive entries (parallel tool results)

-Concurrent tool execution preserves message/result ordering when reinserting tool responses into conversation history.
+Providers validate these sequences and will reject malformed histories.

-## Callback surfaces
+## Interruptible API Calls

-`AIAgent` supports platform/integration callbacks such as:
+API requests are wrapped in `_api_call_with_interrupt()` which runs the actual HTTP call in a background thread while monitoring an interrupt event:

- `tool_progress_callback`
- `thinking_callback`
- `reasoning_callback`
- `clarify_callback`
- `step_callback`
- `stream_delta_callback`
- `tool_gen_callback`
- `status_callback`
+```text
+┌──────────────────────┐     ┌──────────────┐
+│  Main thread         │     │  API thread   │
+│  wait on:            │────▶│  HTTP POST    │
+│  - response ready    │     │  to provider  │
+│  - interrupt event   │     └──────────────┘
+│  - timeout           │
+└──────────────────────┘
+```

-These are how the CLI, gateway, and ACP integrations stream intermediate progress and interactive approval/clarification flows.
+When interrupted (user sends new message, `/stop` command, or signal):
+- The API thread is abandoned (response discarded)
+- The agent can process the new input or shut down cleanly
+- No partial response is injected into conversation history

-## Budget and fallback behavior
+## Tool Execution

-Hermes tracks a shared iteration budget across parent and subagents. It also injects budget pressure hints near the end of the available iteration window.
+### Sequential vs Concurrent

-Fallback model support allows the agent to switch providers/models when the primary route fails in supported failure paths.
+When the model returns tool calls:

-## Compression and persistence
+- **Single tool call** → executed directly in the main thread
+- **Multiple tool calls** → executed concurrently via `ThreadPoolExecutor`
+  - Exception: tools marked as interactive (e.g., `clarify`) force sequential execution
+  - Results are reinserted in the original tool call order regardless of completion order

-Before and during long runs, Hermes may:
+### Execution Flow

- flush memory before context loss
- compress middle conversation turns
- split the session lineage into a new session ID after compression
- preserve recent context and structural tool-call/result consistency
+```text
+for each tool_call in response.tool_calls:
+    1. Resolve handler from tools/registry.py
+    2. Fire pre_tool_call plugin hook
+    3. Check if dangerous command (tools/approval.py)
+       - If dangerous: invoke approval_callback, wait for user
+    4. Execute handler with args + task_id
+    5. Fire post_tool_call plugin hook
+    6. Append {"role": "tool", "content": result} to history
+```

-## Key files to read next
+### Agent-Level Tools

- `run_agent.py`
- `agent/prompt_builder.py`
- `agent/context_compressor.py`
- `agent/prompt_caching.py`
- `model_tools.py`
+Some tools are intercepted by `run_agent.py` *before* reaching `handle_function_call()`:

-## Related docs
+| Tool | Why intercepted |
+|------|-----------------|
+| `todo` | Reads/writes agent-local task state |
+| `memory` | Writes to persistent memory files with character limits |
+
+These tools modify agent state directly and return synthetic tool results without going through the registry.
+
+## Callback Surfaces
+
+`AIAgent` supports platform-specific callbacks that enable real-time progress in the CLI, gateway, and ACP integrations:
+
+| Callback | When fired | Used by |
+|----------|-----------|---------|
+| `tool_progress_callback` | Before/after each tool execution | CLI spinner, gateway progress messages |
+| `thinking_callback` | When model starts/stops thinking | CLI "thinking..." indicator |
+| `reasoning_callback` | When model returns reasoning content | CLI reasoning display, gateway reasoning blocks |
+| `clarify_callback` | When `clarify` tool is called | CLI input prompt, gateway interactive message |
+| `step_callback` | After each complete agent turn | Gateway step tracking, ACP progress |
+| `stream_delta_callback` | Each streaming token (when enabled) | CLI streaming display |
+| `tool_gen_callback` | When tool call is parsed from stream | CLI tool preview in spinner |
+| `status_callback` | State changes (thinking, executing, etc.) | ACP status updates |
+
+## Budget and Fallback Behavior
+
+### Iteration Budget
+
+The agent tracks iterations via `IterationBudget`:
+
+- Default: 90 iterations (configurable via `agent.max_turns`)
+- Shared across parent and child agents — a subagent consumes from the parent's budget
+- At 70%+ usage, `_get_budget_warning()` appends a `[BUDGET WARNING: ...]` to the last tool result
+- At 100%, the agent stops and returns a summary of work done
+
+### Fallback Model
+
+When the primary model fails (429 rate limit, 5xx server error, 401/403 auth error):
+
+1. Check `fallback_providers` list in config
+2. Try each fallback in order
+3. On success, continue the conversation with the new provider
+4. On 401/403, attempt credential refresh before failing over
+
+The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the `auxiliary.*` config section.
+
+## Compression and Persistence
+
+### When Compression Triggers
+
+- **Preflight** (before API call): If conversation exceeds 50% of model's context window
+- **Gateway auto-compression**: If conversation exceeds 85% (more aggressive, runs between turns)
+
+### What Happens During Compression
+
+1. Memory is flushed to disk first (preventing data loss)
+2. Middle conversation turns are summarized into a compact summary
+3. The last N messages are preserved intact (`compression.protect_last_n`, default: 20)
+4. Tool call/result message pairs are kept together (never split)
+5. A new session lineage ID is generated (compression creates a "child" session)
+
+### Session Persistence
+
+After each turn:
+- Messages are saved to the session store (SQLite via `hermes_state.py`)
+- Memory changes are flushed to `MEMORY.md` / `USER.md`
+- The session can be resumed later via `/resume` or `hermes chat --resume`
+
+## Key Source Files
+
+| File | Purpose |
+|------|---------|
+| `run_agent.py` | AIAgent class — the complete agent loop (~9,200 lines) |
+| `agent/prompt_builder.py` | System prompt assembly from memory, skills, context files, personality |
+| `agent/context_compressor.py` | Conversation compression algorithm |
+| `agent/prompt_caching.py` | Anthropic prompt caching markers and cache metrics |
+| `agent/auxiliary_client.py` | Auxiliary LLM client for side tasks (vision, summarization) |
+| `model_tools.py` | Tool schema collection, `handle_function_call()` dispatch |
+
+## Related Docs

 - [Provider Runtime Resolution](./provider-runtime.md)
 - [Prompt Assembly](./prompt-assembly.md)
 - [Context Compression & Prompt Caching](./context-compression-and-caching.md)
 - [Tools Runtime](./tools-runtime.md)
+- [Architecture Overview](./architecture.md)
--- a/website/docs/developer-guide/architecture.md
+++ b/website/docs/developer-guide/architecture.md
@ -1,152 +1,274 @@
 ---
 sidebar_position: 1
 title: "Architecture"
-description: "Hermes Agent internals — major subsystems, execution paths, and where to read next"
+description: "Hermes Agent internals — major subsystems, execution paths, data flow, and where to read next"
 ---

 # Architecture

-This page is the top-level map of Hermes Agent internals. The project has grown beyond a single monolithic loop, so the best way to understand it is by subsystem.
+This page is the top-level map of Hermes Agent internals. Use it to orient yourself in the codebase, then dive into subsystem-specific docs for implementation details.

-## High-level structure
+## System Overview
+
+```text
+┌─────────────────────────────────────────────────────────────────────┐
+│                        Entry Points                                  │
+│                                                                      │
+│  CLI (cli.py)    Gateway (gateway/run.py)    ACP (acp_adapter/)     │
+│  Batch Runner    API Server                  Python Library          │
+└──────────┬──────────────┬───────────────────────┬────────────────────┘
+           │              │                       │
+           ▼              ▼                       ▼
+┌─────────────────────────────────────────────────────────────────────┐
+│                     AIAgent (run_agent.py)                           │
+│                                                                      │
+│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐                │
+│  │ Prompt        │ │ Provider     │ │ Tool         │                │
+│  │ Builder       │ │ Resolution   │ │ Dispatch     │                │
+│  │ (prompt_      │ │ (runtime_    │ │ (model_      │                │
+│  │  builder.py)  │ │  provider.py)│ │  tools.py)   │                │
+│  └──────┬───────┘ └──────┬───────┘ └──────┬───────┘                │
+│         │                │                │                          │
+│  ┌──────┴───────┐ ┌──────┴───────┐ ┌──────┴───────┐                │
+│  │ Compression  │ │ 3 API Modes  │ │ Tool Registry│                │
+│  │ & Caching    │ │ chat_compl.  │ │ (registry.py)│                │
+│  │              │ │ codex_resp.  │ │ 47 tools     │                │
+│  │              │ │ anthropic    │ │ 37 toolsets   │                │
+│  └──────────────┘ └──────────────┘ └──────────────┘                │
+└─────────────────────────────────────────────────────────────────────┘
+           │                                    │
+           ▼                                    ▼
+┌───────────────────┐              ┌──────────────────────┐
+│ Session Storage   │              │ Tool Backends         │
+│ (SQLite + FTS5)   │              │ Terminal (6 backends) │
+│ hermes_state.py   │              │ Browser (5 backends)  │
+│ gateway/session.py│              │ Web (4 backends)      │
+└───────────────────┘              │ MCP (dynamic)         │
+                                   │ File, Vision, etc.    │
+                                   └──────────────────────┘
+```
+
+## Directory Structure

 ```text
 hermes-agent/
-├── run_agent.py              # AIAgent core loop
-├── cli.py                    # interactive terminal UI
-├── model_tools.py            # tool discovery/orchestration
-├── toolsets.py               # tool groupings and presets
-├── hermes_state.py           # SQLite session/state database
-├── batch_runner.py           # batch trajectory generation
+├── run_agent.py              # AIAgent — core conversation loop (~9,200 lines)
+├── cli.py                    # HermesCLI — interactive terminal UI (~8,500 lines)
+├── model_tools.py            # Tool discovery, schema collection, dispatch
+├── toolsets.py               # Tool groupings and platform presets
+├── hermes_state.py           # SQLite session/state database with FTS5
+├── hermes_constants.py       # HERMES_HOME, profile-aware paths
+├── batch_runner.py           # Batch trajectory generation
 │
-├── agent/                    # prompt building, compression, caching, metadata, trajectories
-├── hermes_cli/               # command entrypoints, auth, setup, models, config, doctor
-├── tools/                    # tool implementations and terminal environments
-├── gateway/                  # messaging gateway, session routing, delivery, pairing, hooks
-├── cron/                     # scheduled job storage and scheduler
-├── plugins/memory/           # Memory provider plugins (honcho, openviking, mem0, etc.)
-├── acp_adapter/              # ACP editor integration server
-├── acp_registry/             # ACP registry manifest + icon
-├── environments/             # Hermes RL / benchmark environment framework
-├── skills/                   # bundled skills
-├── optional-skills/          # official optional skills
-└── tests/                    # test suite
+├── agent/                    # Agent internals
+│   ├── prompt_builder.py     # System prompt assembly
+│   ├── context_compressor.py # Conversation compression algorithm
+│   ├── prompt_caching.py     # Anthropic prompt caching
+│   ├── auxiliary_client.py   # Auxiliary LLM for side tasks (vision, summarization)
+│   ├── model_metadata.py     # Model context lengths, token estimation
+│   ├── models_dev.py         # models.dev registry integration
+│   ├── anthropic_adapter.py  # Anthropic Messages API format conversion
+│   ├── display.py            # KawaiiSpinner, tool preview formatting
+│   ├── skill_commands.py     # Skill slash commands
+│   ├── memory_store.py       # Persistent memory read/write
+│   └── trajectory.py         # Trajectory saving helpers
+│
+├── hermes_cli/               # CLI subcommands and setup
+│   ├── main.py               # Entry point — all `hermes` subcommands (~4,200 lines)
+│   ├── config.py             # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
+│   ├── commands.py           # COMMAND_REGISTRY — central slash command definitions
+│   ├── auth.py               # PROVIDER_REGISTRY, credential resolution
+│   ├── runtime_provider.py   # Provider → api_mode + credentials
+│   ├── models.py             # Model catalog, provider model lists
+│   ├── model_switch.py       # /model command logic (CLI + gateway shared)
+│   ├── setup.py              # Interactive setup wizard (~3,500 lines)
+│   ├── skin_engine.py        # CLI theming engine
+│   ├── skills_config.py      # hermes skills — enable/disable per platform
+│   ├── skills_hub.py         # /skills slash command
+│   ├── tools_config.py       # hermes tools — enable/disable per platform
+│   ├── plugins.py            # PluginManager — discovery, loading, hooks
+│   ├── callbacks.py          # Terminal callbacks (clarify, sudo, approval)
+│   └── gateway.py            # hermes gateway start/stop
+│
+├── tools/                    # Tool implementations (one file per tool)
+│   ├── registry.py           # Central tool registry
+│   ├── approval.py           # Dangerous command detection
+│   ├── terminal_tool.py      # Terminal orchestration
+│   ├── process_registry.py   # Background process management
+│   ├── file_tools.py         # read_file, write_file, patch, search_files
+│   ├── web_tools.py          # web_search, web_extract
+│   ├── browser_tool.py       # 11 browser automation tools
+│   ├── code_execution_tool.py # execute_code sandbox
+│   ├── delegate_tool.py      # Subagent delegation
+│   ├── mcp_tool.py           # MCP client (~1,050 lines)
+│   ├── credential_files.py   # File-based credential passthrough
+│   ├── env_passthrough.py    # Env var passthrough for sandboxes
+│   ├── ansi_strip.py         # ANSI escape stripping
+│   └── environments/         # Terminal backends (local, docker, ssh, modal, daytona, singularity)
+│
+├── gateway/                  # Messaging platform gateway
+│   ├── run.py                # GatewayRunner — message dispatch (~5,800 lines)
+│   ├── session.py            # SessionStore — conversation persistence
+│   ├── delivery.py           # Outbound message delivery
+│   ├── pairing.py            # DM pairing authorization
+│   ├── hooks.py              # Hook discovery and lifecycle events
+│   ├── mirror.py             # Cross-session message mirroring
+│   ├── status.py             # Token locks, profile-scoped process tracking
+│   ├── builtin_hooks/        # Always-registered hooks
+│   └── platforms/            # 14 adapters: telegram, discord, slack, whatsapp,
+│                             #   signal, matrix, mattermost, email, sms,
+│                             #   dingtalk, feishu, wecom, homeassistant, webhook
+│
+├── acp_adapter/              # ACP server (VS Code / Zed / JetBrains)
+├── cron/                     # Scheduler (jobs.py, scheduler.py)
+├── plugins/memory/           # Memory provider plugins
+├── environments/             # RL training environments (Atropos)
+├── skills/                   # Bundled skills (always available)
+├── optional-skills/          # Official optional skills (install explicitly)
+├── website/                  # Docusaurus documentation site
+└── tests/                    # Pytest suite (~3,000+ tests)
 ```

-## Recommended reading order
+## Data Flow

-If you are new to the codebase, read in this order:
+### CLI Session

-1. this page
-2. [Agent Loop Internals](./agent-loop.md)
-3. [Prompt Assembly](./prompt-assembly.md)
-4. [Provider Runtime Resolution](./provider-runtime.md)
-5. [Adding Providers](./adding-providers.md)
-6. [Tools Runtime](./tools-runtime.md)
-7. [Session Storage](./session-storage.md)
-8. [Gateway Internals](./gateway-internals.md)
-9. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
-10. [ACP Internals](./acp-internals.md)
-11. [Environments, Benchmarks & Data Generation](./environments.md)
+```text
+User input → HermesCLI.process_input()
+  → AIAgent.run_conversation()
+    → prompt_builder.build_system_prompt()
+    → runtime_provider.resolve_runtime_provider()
+    → API call (chat_completions / codex_responses / anthropic_messages)
+    → tool_calls? → model_tools.handle_function_call() → loop
+    → final response → display → save to SessionDB
+```

-## Major subsystems
+### Gateway Message

-### Agent loop
+```text
+Platform event → Adapter.on_message() → MessageEvent
+  → GatewayRunner._handle_message()
+    → authorize user
+    → resolve session key
+    → create AIAgent with session history
+    → AIAgent.run_conversation()
+    → deliver response back through adapter
+```

-The core synchronous orchestration engine is `AIAgent` in `run_agent.py`.
+### Cron Job

-It is responsible for:
+```text
+Scheduler tick → load due jobs from jobs.json
+  → create fresh AIAgent (no history)
+  → inject attached skills as context
+  → run job prompt
+  → deliver response to target platform
+  → update job state and next_run
+```

- provider/API-mode selection
- prompt construction
- tool execution
- retries and fallback
- callbacks
- compression and persistence
+## Recommended Reading Order

-See [Agent Loop Internals](./agent-loop.md).
+If you are new to the codebase:

-### Prompt system
+1. **This page** — orient yourself
+2. **[Agent Loop Internals](./agent-loop.md)** — how AIAgent works
+3. **[Prompt Assembly](./prompt-assembly.md)** — system prompt construction
+4. **[Provider Runtime Resolution](./provider-runtime.md)** — how providers are selected
+5. **[Adding Providers](./adding-providers.md)** — practical guide to adding a new provider
+6. **[Tools Runtime](./tools-runtime.md)** — tool registry, dispatch, environments
+7. **[Session Storage](./session-storage.md)** — SQLite schema, FTS5, session lineage
+8. **[Gateway Internals](./gateway-internals.md)** — messaging platform gateway
+9. **[Context Compression & Prompt Caching](./context-compression-and-caching.md)** — compression and caching
+10. **[ACP Internals](./acp-internals.md)** — IDE integration
+11. **[Environments, Benchmarks & Data Generation](./environments.md)** — RL training

-Prompt-building logic is split between:
+## Major Subsystems

- `run_agent.py`
- `agent/prompt_builder.py`
- `agent/prompt_caching.py`
- `agent/context_compressor.py`
+### Agent Loop

-See:
+The synchronous orchestration engine (`AIAgent` in `run_agent.py`). Handles provider selection, prompt construction, tool execution, retries, fallback, callbacks, compression, and persistence. Supports three API modes for different provider backends.

- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
+→ [Agent Loop Internals](./agent-loop.md)

-### Provider/runtime resolution
+### Prompt System

-Hermes has a shared runtime provider resolver used by CLI, gateway, cron, ACP, and auxiliary calls.
+Prompt construction and maintenance across the conversation lifecycle:

-See [Provider Runtime Resolution](./provider-runtime.md).
+- **`prompt_builder.py`** — Assembles the system prompt from: personality (SOUL.md), memory (MEMORY.md, USER.md), skills, context files (AGENTS.md, .hermes.md), tool-use guidance, and model-specific instructions
+- **`prompt_caching.py`** — Applies Anthropic cache breakpoints for prefix caching
+- **`context_compressor.py`** — Summarizes middle conversation turns when context exceeds thresholds

-### Tooling runtime
+→ [Prompt Assembly](./prompt-assembly.md), [Context Compression & Prompt Caching](./context-compression-and-caching.md)

-The tool registry, toolsets, terminal backends, process manager, and dispatch rules form a subsystem of their own.
+### Provider Resolution

-See [Tools Runtime](./tools-runtime.md).
+A shared runtime resolver used by CLI, gateway, cron, ACP, and auxiliary calls. Maps `(provider, model)` tuples to `(api_mode, api_key, base_url)`. Handles 18+ providers, OAuth flows, credential pools, and alias resolution.

-### Session persistence
+→ [Provider Runtime Resolution](./provider-runtime.md)

-Historical session state is stored primarily in SQLite, with lineage preserved across compression splits.
+### Tool System

-See [Session Storage](./session-storage.md).
+Central tool registry (`tools/registry.py`) with 47 registered tools across 20 toolsets. Each tool file self-registers at import time. The registry handles schema collection, dispatch, availability checking, and error wrapping. Terminal tools support 6 backends (local, Docker, SSH, Daytona, Modal, Singularity).

-### Messaging gateway
+→ [Tools Runtime](./tools-runtime.md)

-The gateway is a long-running orchestration layer for platform adapters, session routing, pairing, delivery, and cron ticking.
+### Session Persistence

-See [Gateway Internals](./gateway-internals.md).
+SQLite-based session storage with FTS5 full-text search. Sessions have lineage tracking (parent/child across compressions), per-platform isolation, and atomic writes with contention handling.

-### ACP integration
+→ [Session Storage](./session-storage.md)

-ACP exposes Hermes as an editor-native agent over stdio/JSON-RPC.
+### Messaging Gateway

-See:
+Long-running process with 14 platform adapters, unified session routing, user authorization (allowlists + DM pairing), slash command dispatch, hook system, cron ticking, and background maintenance.

- [ACP Editor Integration](../user-guide/features/acp.md)
- [ACP Internals](./acp-internals.md)
+→ [Gateway Internals](./gateway-internals.md)
+
+### Plugin System
+
+Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Memory providers are a specialized plugin type under `plugins/memory/`.
+
+→ [Plugin Guide](/docs/guides/build-a-hermes-plugin), [Memory Provider Plugin](./memory-provider-plugin.md)

 ### Cron

-Cron jobs are implemented as first-class agent tasks, not just shell tasks.
+First-class agent tasks (not shell tasks). Jobs store in JSON, support multiple schedule formats, can attach skills and scripts, and deliver to any platform.

-See [Cron Internals](./cron-internals.md).
+→ [Cron Internals](./cron-internals.md)

-### RL / environments / trajectories
+### ACP Integration

-Hermes ships a full environment framework for evaluation, RL integration, and SFT data generation.
+Exposes Hermes as an editor-native agent over stdio/JSON-RPC for VS Code, Zed, and JetBrains.

-See:
+→ [ACP Internals](./acp-internals.md)

- [Environments, Benchmarks & Data Generation](./environments.md)
- [Trajectories & Training Format](./trajectory-format.md)
+### RL / Environments / Trajectories

-## Design themes
+Full environment framework for evaluation and RL training. Integrates with Atropos, supports multiple tool-call parsers, and generates ShareGPT-format trajectories.

-Several cross-cutting design themes appear throughout the codebase:
+→ [Environments, Benchmarks & Data Generation](./environments.md), [Trajectories & Training Format](./trajectory-format.md)

- prompt stability matters
- tool execution must be observable and interruptible
- session persistence must survive long-running use
- platform frontends should share one agent core
- optional subsystems should remain loosely coupled where possible
+## Design Principles

-## Implementation notes
+| Principle | What it means in practice |
+|-----------|--------------------------|
+| **Prompt stability** | System prompt doesn't change mid-conversation. No cache-breaking mutations except explicit user actions (`/model`). |
+| **Observable execution** | Every tool call is visible to the user via callbacks. Progress updates in CLI (spinner) and gateway (chat messages). |
+| **Interruptible** | API calls and tool execution can be cancelled mid-flight by user input or signals. |
+| **Platform-agnostic core** | One AIAgent class serves CLI, gateway, ACP, batch, and API server. Platform differences live in the entry point, not the agent. |
+| **Loose coupling** | Optional subsystems (MCP, plugins, memory providers, RL environments) use registry patterns and check_fn gating, not hard dependencies. |
+| **Profile isolation** | Each profile (`hermes -p <name>`) gets its own HERMES_HOME, config, memory, sessions, and gateway PID. Multiple profiles run concurrently. |

-The older mental model of Hermes as “one OpenAI-compatible chat loop plus some tools” is no longer sufficient. Current Hermes includes:
+## File Dependency Chain

- multiple API modes
- auxiliary model routing
- ACP editor integration
- gateway-specific session and delivery semantics
- RL environment infrastructure
- prompt-caching and compression logic with lineage-aware persistence
+```text
+tools/registry.py  (no deps — imported by all tool files)
+       ↑
+tools/*.py  (each calls registry.register() at import time)
+       ↑
+model_tools.py  (imports tools/registry + triggers tool discovery)
+       ↑
+run_agent.py, cli.py, batch_runner.py, environments/
+```

-Use this page as the map, then dive into subsystem-specific docs for the real implementation details.
+This chain means tool registration happens at import time, before any agent instance is created. Adding a new tool requires an import in `model_tools.py`'s `_discover_tools()` list.
--- a/website/docs/developer-guide/context-compression-and-caching.md
+++ b/website/docs/developer-guide/context-compression-and-caching.md
@ -4,7 +4,7 @@ Hermes Agent uses a dual compression system and Anthropic prompt caching to
 manage context window usage efficiently across long conversations.

 Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`,
-`gateway/run.py` (session hygiene), `run_agent.py` (lines 1146-1204)
+`gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`)


 ## Dual Compression System
@ -26,7 +26,7 @@ Hermes has two separate compression layers that operate independently:

 ### 1. Gateway Session Hygiene (85% threshold)

-Located in `gateway/run.py` (around line 2220). This is a **safety net** that
+Located in `gateway/run.py` (search for `_maybe_compress_session`). This is a **safety net** that
 runs before the agent processes a message. It prevents API failures when sessions
 grow too large between turns (e.g., overnight accumulation in Telegram/Discord).

--- a/website/docs/developer-guide/cron-internals.md
+++ b/website/docs/developer-guide/cron-internals.md
@ -6,85 +6,195 @@ description: "How Hermes stores, schedules, edits, pauses, skill-loads, and deli

 # Cron Internals

-Hermes cron support is implemented primarily in:
+The cron subsystem provides scheduled task execution — from simple one-shot delays to recurring cron-expression jobs with skill injection and cross-platform delivery.

- `cron/jobs.py`
- `cron/scheduler.py`
- `tools/cronjob_tools.py`
- `gateway/run.py`
- `hermes_cli/cron.py`
+## Key Files

-## Scheduling model
+| File | Purpose |
+|------|---------|
+| `cron/jobs.py` | Job model, storage, atomic read/write to `jobs.json` |
+| `cron/scheduler.py` | Scheduler loop — due-job detection, execution, repeat tracking |
+| `tools/cronjob_tools.py` | Model-facing `cronjob` tool registration and handler |
+| `gateway/run.py` | Gateway integration — cron ticking in the long-running loop |
+| `hermes_cli/cron.py` | CLI `hermes cron` subcommands |

-Hermes supports:
+## Scheduling Model

- one-shot delays
- intervals
- cron expressions
- explicit timestamps
+Four schedule formats are supported:

-The model-facing surface is a single `cronjob` tool with action-style operations:
+| Format | Example | Behavior |
+|--------|---------|----------|
+| **Relative delay** | `30m`, `2h`, `1d` | One-shot, fires after the specified duration |
+| **Interval** | `every 2h`, `every 30m` | Recurring, fires at regular intervals |
+| **Cron expression** | `0 9 * * *` | Standard 5-field cron syntax (minute, hour, day, month, weekday) |
+| **ISO timestamp** | `2025-01-15T09:00:00` | One-shot, fires at the exact time |

- `create`
- `list`
- `update`
- `pause`
- `resume`
- `run`
- `remove`
+The model-facing surface is a single `cronjob` tool with action-style operations: `create`, `list`, `update`, `pause`, `resume`, `run`, `remove`.

-## Job storage
+## Job Storage

-Cron jobs are stored in Hermes-managed local state (`~/.hermes/cron/jobs.json`) with atomic write semantics.
+Jobs are stored in `~/.hermes/cron/jobs.json` with atomic write semantics (write to temp file, then rename). Each job record contains:

-Each job can carry:
+```json
+{
+  "id": "job_abc123",
+  "name": "Daily briefing",
+  "prompt": "Summarize today's AI news and funding rounds",
+  "schedule": "0 9 * * *",
+  "skills": ["ai-funding-daily-report"],
+  "deliver": "telegram:-1001234567890",
+  "repeat": null,
+  "state": "scheduled",
+  "next_run": "2025-01-16T09:00:00Z",
+  "run_count": 42,
+  "created_at": "2025-01-01T00:00:00Z",
+  "model": null,
+  "provider": null,
+  "script": null
+}
+```

- prompt
- schedule metadata
- repeat counters
- delivery target
- lifecycle state (`scheduled`, `paused`, `completed`, etc.)
- zero, one, or multiple attached skills
+### Job Lifecycle States

-Backward compatibility is preserved for older jobs that only stored a legacy single `skill` field or none of the newer lifecycle fields.
+| State | Meaning |
+|-------|---------|
+| `scheduled` | Active, will fire at next scheduled time |
+| `paused` | Suspended — won't fire until resumed |
+| `completed` | Repeat count exhausted or one-shot that has fired |
+| `running` | Currently executing (transient state) |

-## Runtime behavior
+### Backward Compatibility

-The scheduler:
+Older jobs may have a single `skill` field instead of the `skills` array. The scheduler normalizes this at load time — single `skill` is promoted to `skills: [skill]`.

- loads jobs
- computes due work
- executes jobs in fresh agent sessions
- optionally injects one or more skills before the prompt
- handles repeat counters
- updates next-run metadata and state
+## Scheduler Runtime

-In gateway mode, cron ticking is integrated into the long-running gateway loop.
+### Tick Cycle

-## Skill-backed jobs
+The scheduler runs on a periodic tick (default: every 60 seconds):

-A cron job may attach multiple skills. At runtime, Hermes loads those skills in order and then appends the job prompt as the task instruction.
+```text
+tick()
+  1. Acquire scheduler lock (prevents overlapping ticks)
+  2. Load all jobs from jobs.json
+  3. Filter to due jobs (next_run <= now AND state == "scheduled")
+  4. For each due job:
+     a. Set state to "running"
+     b. Create fresh AIAgent session (no conversation history)
+     c. Load attached skills in order (injected as user messages)
+     d. Run the job prompt through the agent
+     e. Deliver the response to the configured target
+     f. Update run_count, compute next_run
+     g. If repeat count exhausted → state = "completed"
+     h. Otherwise → state = "scheduled"
+  5. Write updated jobs back to jobs.json
+  6. Release scheduler lock
+```

-This gives scheduled jobs reusable guidance without requiring the user to paste full skill bodies into the cron prompt.
+### Gateway Integration

-## Recursion guard
+In gateway mode, the scheduler tick is integrated into the gateway's main event loop. The gateway calls `scheduler.tick()` on its periodic maintenance cycle, which runs alongside message handling.

-Cron-run sessions disable the `cronjob` toolset. This prevents a scheduled job from recursively creating or mutating more cron jobs and accidentally exploding token usage or scheduler load.
+In CLI mode, cron jobs only fire when `hermes cron` commands are run or during active CLI sessions.

-## Delivery model
+### Fresh Session Isolation

-Cron jobs can deliver to:
+Each cron job runs in a completely fresh agent session:

- origin chat
- local files
- platform home channels
- explicit platform/chat IDs
+- No conversation history from previous runs
+- No memory of previous cron executions (unless persisted to memory/files)
+- The prompt must be self-contained — cron jobs cannot ask clarifying questions
+- The `cronjob` toolset is disabled (recursion guard)
+
+## Skill-Backed Jobs
+
+A cron job can attach one or more skills via the `skills` field. At execution time:
+
+1. Skills are loaded in the specified order
+2. Each skill's SKILL.md content is injected as context
+3. The job's prompt is appended as the task instruction
+4. The agent processes the combined skill context + prompt
+
+This enables reusable, tested workflows without pasting full instructions into cron prompts. For example:
+
+```
+Create a daily funding report → attach "ai-funding-daily-report" skill
+```
+
+### Script-Backed Jobs
+
+Jobs can also attach a Python script via the `script` field. The script runs *before* each agent turn, and its stdout is injected into the prompt as context. This enables data collection and change detection patterns:
+
+```python
+# ~/.hermes/scripts/check_competitors.py
+import requests, json
+# Fetch competitor release notes, diff against last run
+# Print summary to stdout — agent analyzes and reports
+```
+
+## Delivery Model
+
+Cron job results can be delivered to any supported platform:
+
+| Target | Syntax | Example |
+|--------|--------|---------|
+| Origin chat | `origin` | Deliver to the chat where the job was created |
+| Local file | `local` | Save to `~/.hermes/cron/output/` |
+| Telegram | `telegram` or `telegram:<chat_id>` | `telegram:-1001234567890` |
+| Discord | `discord` or `discord:#channel` | `discord:#engineering` |
+| Slack | `slack` | Deliver to Slack home channel |
+| WhatsApp | `whatsapp` | Deliver to WhatsApp home |
+| Signal | `signal` | Deliver to Signal |
+| Matrix | `matrix` | Deliver to Matrix home room |
+| Mattermost | `mattermost` | Deliver to Mattermost home |
+| Email | `email` | Deliver via email |
+| SMS | `sms` | Deliver via SMS |
+| Home Assistant | `homeassistant` | Deliver to HA conversation |
+| DingTalk | `dingtalk` | Deliver to DingTalk |
+| Feishu | `feishu` | Deliver to Feishu |
+| WeCom | `wecom` | Deliver to WeCom |
+
+For Telegram topics, use the format `telegram:<chat_id>:<thread_id>` (e.g., `telegram:-1001234567890:17585`).
+
+### Response Wrapping
+
+By default (`cron.wrap_response: true`), cron deliveries are wrapped with:
+- A header identifying the cron job name and task
+- A footer noting the agent cannot see the delivered message in conversation
+
+The `[SILENT]` prefix in a cron response suppresses delivery entirely — useful for jobs that only need to write to files or perform side effects.
+
+### Session Isolation
+
+Cron deliveries are NOT mirrored into gateway session conversation history. They exist only in the cron job's own session. This prevents message alternation violations in the target chat's conversation.
+
+## Recursion Guard
+
+Cron-run sessions have the `cronjob` toolset disabled. This prevents:
+- A scheduled job from creating new cron jobs
+- Recursive scheduling that could explode token usage
+- Accidental mutation of the job schedule from within a job

 ## Locking

-Hermes uses lock-based protections so overlapping scheduler ticks do not execute the same due-job batch twice.
+The scheduler uses file-based locking to prevent overlapping ticks from executing the same due-job batch twice. This is important in gateway mode where multiple maintenance cycles could overlap if a previous tick takes longer than the tick interval.

-## Related docs
+## CLI Interface

- [Cron feature guide](../user-guide/features/cron.md)
+The `hermes cron` CLI provides direct job management:
+
+```bash
+hermes cron list                    # Show all jobs
+hermes cron add                     # Interactive job creation
+hermes cron edit <job_id>           # Edit job configuration
+hermes cron pause <job_id>          # Pause a running job
+hermes cron resume <job_id>         # Resume a paused job
+hermes cron run <job_id>            # Trigger immediate execution
+hermes cron remove <job_id>         # Delete a job
+```
+
+## Related Docs
+
+- [Cron Feature Guide](/docs/user-guide/features/cron)
 - [Gateway Internals](./gateway-internals.md)
+- [Agent Loop Internals](./agent-loop.md)
--- a/website/docs/developer-guide/gateway-internals.md
+++ b/website/docs/developer-guide/gateway-internals.md
@ -6,106 +6,248 @@ description: "How the messaging gateway boots, authorizes users, routes sessions

 # Gateway Internals

-The messaging gateway is the long-running process that connects Hermes to external platforms.
+The messaging gateway is the long-running process that connects Hermes to 14+ external messaging platforms through a unified architecture.

-Key files:
+## Key Files

- `gateway/run.py`
- `gateway/config.py`
- `gateway/session.py`
- `gateway/delivery.py`
- `gateway/pairing.py`
- `gateway/channel_directory.py`
- `gateway/hooks.py`
- `gateway/mirror.py`
- `gateway/platforms/*`
+| File | Purpose |
+|------|---------|
+| `gateway/run.py` | `GatewayRunner` — main loop, slash commands, message dispatch (~7,200 lines) |
+| `gateway/session.py` | `SessionStore` — conversation persistence and session key construction |
+| `gateway/delivery.py` | Outbound message delivery to target platforms/channels |
+| `gateway/pairing.py` | DM pairing flow for user authorization |
+| `gateway/channel_directory.py` | Maps chat IDs to human-readable names for cron delivery |
+| `gateway/hooks.py` | Hook discovery, loading, and lifecycle event dispatch |
+| `gateway/mirror.py` | Cross-session message mirroring for `send_message` |
+| `gateway/status.py` | Token lock management for profile-scoped gateway instances |
+| `gateway/builtin_hooks/` | Always-registered hooks (e.g., BOOT.md system prompt hook) |
+| `gateway/platforms/` | Platform adapters (one per messaging platform) |

-## Core responsibilities
+## Architecture Overview

-The gateway process is responsible for:
+```text
+┌─────────────────────────────────────────────────┐
+│                 GatewayRunner                     │
+│                                                   │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
+│  │ Telegram  │  │ Discord  │  │  Slack   │  ...  │
+│  │ Adapter   │  │ Adapter  │  │ Adapter  │       │
+│  └─────┬─────┘  └─────┬────┘  └─────┬────┘       │
+│        │              │              │             │
+│        └──────────────┼──────────────┘             │
+│                       ▼                            │
+│              _handle_message()                     │
+│                       │                            │
+│          ┌────────────┼────────────┐               │
+│          ▼            ▼            ▼               │
+│   Slash command   AIAgent      Queue/BG            │
+│    dispatch       creation     sessions            │
+│                       │                            │
+│                       ▼                            │
+│              SessionStore                          │
+│           (SQLite persistence)                     │
+└─────────────────────────────────────────────────┘
+```

- loading configuration from `.env`, `config.yaml`, and `gateway.json`
- starting platform adapters
- authorizing users
- routing incoming events to sessions
- maintaining per-chat session continuity
- dispatching messages to `AIAgent`
- running cron ticks and background maintenance tasks
- mirroring/proactively delivering output to configured channels
+## Message Flow

-## Config sources
+When a message arrives from any platform:

-The gateway has a multi-source config model:
+1. **Platform adapter** receives raw event, normalizes it into a `MessageEvent`
+2. **Base adapter** checks active session guard:
+   - If agent is running for this session → queue message, set interrupt event
+   - If `/approve`, `/deny`, `/stop` → bypass guard (dispatched inline)
+3. **GatewayRunner._handle_message()** receives the event:
+   - Resolve session key via `_session_key_for_source()` (format: `agent:main:{platform}:{chat_type}:{chat_id}`)
+   - Check authorization (see Authorization below)
+   - Check if it's a slash command → dispatch to command handler
+   - Check if agent is already running → intercept commands like `/stop`, `/status`
+   - Otherwise → create `AIAgent` instance and run conversation
+4. **Response** is sent back through the platform adapter

- environment variables
- `~/.hermes/gateway.json`
- selected bridged values from `~/.hermes/config.yaml`
+### Session Key Format

-## Session routing
+Session keys encode the full routing context:

-`gateway/session.py` and `GatewayRunner` cooperate to map incoming messages to active session IDs.
+```
+agent:main:{platform}:{chat_type}:{chat_id}
+```

-Session keying can depend on:
+For example: `agent:main:telegram:private:123456789`

- platform
- user/chat identity
- thread/topic identity
- special platform-specific routing behavior
+Thread-aware platforms (Telegram forum topics, Discord threads, Slack threads) may include thread IDs in the chat_id portion. **Never construct session keys manually** — always use `build_session_key()` from `gateway/session.py`.

-## Authorization layers
+### Two-Level Message Guard

-The gateway can authorize through:
+When an agent is actively running, incoming messages pass through two sequential guards:

- platform allowlists
- gateway-wide allowlists
- DM pairing flows
- explicit allow-all settings
+1. **Level 1 — Base adapter** (`gateway/platforms/base.py`): Checks `_active_sessions`. If the session is active, queues the message in `_pending_messages` and sets an interrupt event. This catches messages *before* they reach the gateway runner.

-Pairing support is implemented in `gateway/pairing.py`.
+2. **Level 2 — Gateway runner** (`gateway/run.py`): Checks `_running_agents`. Intercepts specific commands (`/stop`, `/new`, `/queue`, `/status`, `/approve`, `/deny`) and routes them appropriately. Everything else triggers `running_agent.interrupt()`.

-## Delivery path
+Commands that must reach the runner while the agent is blocked (like `/approve`) are dispatched **inline** via `await self._message_handler(event)` — they bypass the background task system to avoid race conditions.

-Outgoing deliveries are handled by `gateway/delivery.py`, which knows how to:
+## Authorization

- deliver to a home channel
- resolve explicit targets
- mirror some remote deliveries back into local history/session tracking
+The gateway uses a multi-layer authorization check, evaluated in order:
+
+1. **Gateway-wide allow-all** (`GATEWAY_ALLOW_ALL_USERS`) — if set, all users are authorized
+2. **Platform allowlist** (e.g., `TELEGRAM_ALLOWED_USERS`) — comma-separated user IDs
+3. **DM pairing** — authenticated users can pair new users via a pairing code
+4. **Admin escalation** — some commands require admin status beyond basic authorization
+
+### DM Pairing Flow
+
+```text
+Admin: /pair
+Gateway: "Pairing code: ABC123. Share with the user."
+New user: ABC123
+Gateway: "Paired! You're now authorized."
+```
+
+Pairing state is persisted in `gateway/pairing.py` and survives restarts.
+
+## Slash Command Dispatch
+
+All slash commands in the gateway flow through the same resolution pipeline:
+
+1. `resolve_command()` from `hermes_cli/commands.py` maps input to canonical name (handles aliases, prefix matching)
+2. The canonical name is checked against `GATEWAY_KNOWN_COMMANDS`
+3. Handler in `_handle_message()` dispatches based on canonical name
+4. Some commands are gated on config (`gateway_config_gate` on `CommandDef`)
+
+### Running-Agent Guard
+
+Commands that must NOT execute while the agent is processing are rejected early:
+
+```python
+if _quick_key in self._running_agents:
+    if canonical == "model":
+        return "⏳ Agent is running — wait for it to finish or /stop first."
+```
+
+Bypass commands (`/stop`, `/new`, `/approve`, `/deny`, `/queue`, `/status`) have special handling.
+
+## Config Sources
+
+The gateway reads configuration from multiple sources:
+
+| Source | What it provides |
+|--------|-----------------|
+| `~/.hermes/.env` | API keys, bot tokens, platform credentials |
+| `~/.hermes/config.yaml` | Model settings, tool configuration, display options |
+| Environment variables | Override any of the above |
+
+Unlike the CLI (which uses `load_cli_config()` with hardcoded defaults), the gateway reads `config.yaml` directly via YAML loader. This means config keys that exist in the CLI's defaults dict but not in the user's config file may behave differently between CLI and gateway.
+
+## Platform Adapters
+
+Each messaging platform has an adapter in `gateway/platforms/`:
+
+```text
+gateway/platforms/
+├── base.py              # BaseAdapter — shared logic for all platforms
+├── telegram.py          # Telegram Bot API (long polling or webhook)
+├── discord.py           # Discord bot via discord.py
+├── slack.py             # Slack Socket Mode
+├── whatsapp.py          # WhatsApp Business Cloud API
+├── signal.py            # Signal via signal-cli REST API
+├── matrix.py            # Matrix via matrix-nio (optional E2EE)
+├── mattermost.py        # Mattermost WebSocket API
+├── email_adapter.py     # Email via IMAP/SMTP
+├── sms.py               # SMS via Twilio
+├── dingtalk.py          # DingTalk WebSocket
+├── feishu.py            # Feishu/Lark WebSocket or webhook
+├── wecom.py             # WeCom (WeChat Work) callback
+└── homeassistant.py     # Home Assistant conversation integration
+```
+
+Adapters implement a common interface:
+- `connect()` / `disconnect()` — lifecycle management
+- `send_message()` — outbound message delivery
+- `on_message()` — inbound message normalization → `MessageEvent`
+
+### Token Locks
+
+Adapters that connect with unique credentials call `acquire_scoped_lock()` in `connect()` and `release_scoped_lock()` in `disconnect()`. This prevents two profiles from using the same bot token simultaneously.
+
+## Delivery Path
+
+Outgoing deliveries (`gateway/delivery.py`) handle:
+
+- **Direct reply** — send response back to the originating chat
+- **Home channel delivery** — route cron job outputs and background results to a configured home channel
+- **Explicit target delivery** — `send_message` tool specifying `telegram:-1001234567890`
+- **Cross-platform delivery** — deliver to a different platform than the originating message
+
+Cron job deliveries are NOT mirrored into gateway session history — they live in their own cron session only. This is a deliberate design choice to avoid message alternation violations.

 ## Hooks

-Gateway events emit hook callbacks through `gateway/hooks.py`. Hooks are local trusted Python code and can observe or extend gateway lifecycle events.
+Gateway hooks are Python modules that respond to lifecycle events:

-## Background maintenance
+### Gateway Hook Events

-The gateway also runs maintenance tasks such as:
+| Event | When fired |
+|-------|-----------|
+| `gateway:startup` | Gateway process starts |
+| `session:start` | New conversation session begins |
+| `session:end` | Session completes or times out |
+| `session:reset` | User resets session with `/new` |
+| `agent:start` | Agent begins processing a message |
+| `agent:step` | Agent completes one tool-calling iteration |
+| `agent:end` | Agent finishes and returns response |
+| `command:*` | Any slash command is executed |

- cron ticking
- cache refreshes
- session expiry checks
- proactive memory flush before reset/expiry
+Hooks are discovered from `gateway/builtin_hooks/` (always active) and `~/.hermes/hooks/` (user-installed). Each hook is a directory with a `HOOK.yaml` manifest and `handler.py`.

-## Honcho interaction
+## Memory Provider Integration

-When a memory provider plugin (e.g. Honcho) is enabled, the gateway creates an AIAgent per incoming message with the same session ID. The memory provider's `initialize()` receives the session ID and creates the appropriate backend session. Tools are routed through the `MemoryManager`, which handles all provider lifecycle hooks (prefetch, sync, session end).
+When a memory provider plugin (e.g., Honcho) is enabled:

-### Memory provider session routing
+1. Gateway creates an `AIAgent` per message with the session ID
+2. The `MemoryManager` initializes the provider with the session context
+3. Provider tools (e.g., `honcho_profile`, `viking_search`) are routed through:

-Memory provider tools (e.g. `honcho_profile`, `viking_search`) are routed through the MemoryManager in `_invoke_tool()`:
-
-```
+```text
 AIAgent._invoke_tool()
  → self._memory_manager.handle_tool_call(name, args)
    → provider.handle_tool_call(name, args)
 ```

-Each memory provider manages its own session lifecycle internally. The `initialize()` method receives the session ID, and `on_session_end()` handles cleanup and final flush.
+4. On session end/reset, `on_session_end()` fires for cleanup and final data flush

-### Memory flush lifecycle
+### Memory Flush Lifecycle

-When a session is reset, resumed, or expires, the gateway flushes built-in memories before discarding context. The flush creates a temporary `AIAgent` that runs a memory-only conversation turn. The memory provider's `on_session_end()` hook fires during this process, giving external providers a chance to persist any buffered data.
+When a session is reset, resumed, or expires:
+1. Built-in memories are flushed to disk
+2. Memory provider's `on_session_end()` hook fires
+3. A temporary `AIAgent` runs a memory-only conversation turn
+4. Context is then discarded or archived

-## Related docs
+## Background Maintenance
+
+The gateway runs periodic maintenance alongside message handling:
+
+- **Cron ticking** — checks job schedules and fires due jobs
+- **Session expiry** — cleans up abandoned sessions after timeout
+- **Memory flush** — proactively flushes memory before session expiry
+- **Cache refresh** — refreshes model lists and provider status
+
+## Process Management
+
+The gateway runs as a long-lived process, managed via:
+
+- `hermes gateway start` / `hermes gateway stop` — manual control
+- `systemctl` (Linux) or `launchctl` (macOS) — service management
+- PID file at `~/.hermes/gateway.pid` — profile-scoped process tracking
+
+**Profile-scoped vs global**: `start_gateway()` uses profile-scoped PID files. `hermes gateway stop` stops only the current profile's gateway. `hermes gateway stop --all` uses global `ps aux` scanning to kill all gateway processes (used during updates).
+
+## Related Docs

 - [Session Storage](./session-storage.md)
 - [Cron Internals](./cron-internals.md)
 - [ACP Internals](./acp-internals.md)
+- [Agent Loop Internals](./agent-loop.md)
+- [Messaging Gateway (User Guide)](/docs/user-guide/messaging)
--- a/website/docs/developer-guide/trajectory-format.md
+++ b/website/docs/developer-guide/trajectory-format.md
@ -3,7 +3,7 @@
 Hermes Agent saves conversation trajectories in ShareGPT-compatible JSONL format
 for use as training data, debugging artifacts, and reinforcement learning datasets.

-Source files: `agent/trajectory.py`, `run_agent.py` (lines 1788-1975), `batch_runner.py`
+Source files: `agent/trajectory.py`, `run_agent.py` (search for `_save_trajectory`), `batch_runner.py`


 ## File Naming Convention