Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/,
guides/, and integrations/ against the live registries and gateway code.
messaging/
- index.md: API Server toolset is hermes-api-server (was 'hermes (default)');
Google Chat slug is hermes-google_chat (underscore — plugin name uses _).
- google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such
extra); list the actual deps (google-cloud-pubsub, google-api-python-client,
google-auth, google-auth-oauthlib).
- qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is
silently ignored by the adapter); QQ_STT_BASE_URL is not read directly —
baseUrl lives under platforms.qqbot.extra.stt.
- teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline
plugin must be enabled), not a built-in subcommand.
- sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default
SMS_WEBHOOK_HOST).
- open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to
per-profile .env, not 'hermes config set' (same pattern fixed in
api-server.md last round). Also bumped example ports to 8650+ to dodge the
default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646)
collision.
developer-guide/
- architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for
run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py,
gateway/run.py replaced with 'large file' to stop drifting.
- agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)').
- gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway
platform tree updated (qqbot is a sub-package, not qqbot.py; added
yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/
(always active)' was wrong — it's an empty extension point and
_register_builtin_hooks() is a no-op stub.
- acp-internals.md: drop fictional 'message_callback' from the bridged-
callbacks list; clarify thinking_callback is currently set to None.
- provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry,
NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama
Cloud, LM Studio, Tencent TokenHub. Fallback section described only the
legacy single-pair model — corrected to the canonical list-form
fallback_providers chain.
- environments.md: parsers list missing llama4_json and the deepseek_v31
alias; both register via @register_parser.
- browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py
which doesn't exist in-repo.
- contributing.md: tinker-atropos is a git submodule — note that
'git submodule update --init' is required if cloning without
--recurse-submodules.
guides/
- operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is
positional (not --schedule), the script-only flag is --no-agent (not
--script-only), and there's no --command flag. Replaced with a real example
that creates the script under ~/.hermes/scripts/ and uses the actual flags.
Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'.
- automation-templates.md: 'cron create --skills "a,b"' doesn't work —
the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST
rewrite.
- minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently
fails because --region isn't registered on the auth-add argparse spec.
Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for
China-region access.
- cron-script-only.md: 'hermes send' is fictional — replaced the comparison-
table mention with a webhook-subscription pointer; also fixed the dead link
to /guides/pipe-script-output (page doesn't exist).
- cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed
at 'hermes gateway' (foreground) / 'hermes gateway start' (service).
- local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right
knob is the HERMES_API_TIMEOUT env var.
- python-library.md: run_conversation() return dict has only final_response
and messages — task_id is stored on the agent instance, not echoed back.
- use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in
one quoted string, so cmd.exe gets a single arg instead of the multi-token
command line it needs. Removed the surrounding quotes — argparse nargs='*'
collects each token correctly.
integrations/
- providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist);
actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG
and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 ->
api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai)
refreshed. Fallback section rewritten to lead with the canonical
fallback_providers list form (was leading with the legacy fallback_model
single dict); supported-providers list extended to include azure-foundry,
alibaba-coding-plan, lmstudio.
index.md
- '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with
integrations/index.md ('19+') and undercounted — bumped to 20+ and added
Weixin/QQ Bot/Yuanbao/Google Chat to the list.
Validation: 'npm run build' clean (exit 0); broken-link count unchanged at
155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.
10 KiB
| sidebar_position | title | description |
|---|---|---|
| 3 | Agent Loop Internals | Detailed walkthrough of AIAgent execution, API modes, tools, callbacks, and fallback behavior |
Agent Loop Internals
The core orchestration engine is run_agent.py's AIAgent class — a large file (15k+ lines) that handles everything from prompt assembly to tool dispatch to provider failover.
Core Responsibilities
AIAgent is responsible for:
- Assembling the effective system prompt and tool schemas via
prompt_builder.py - Selecting the correct provider/API mode (chat_completions, codex_responses, anthropic_messages)
- Making interruptible model calls with cancellation support
- Executing tool calls (sequentially or concurrently via thread pool)
- Maintaining conversation history in OpenAI message format
- Handling compression, retries, and fallback model switching
- Tracking iteration budgets across parent and child agents
- Flushing persistent memory before context is lost
Two Entry Points
# Simple interface — returns final response string
response = agent.chat("Fix the bug in main.py")
# Full interface — returns dict with messages, metadata, usage stats
result = agent.run_conversation(
user_message="Fix the bug in main.py",
system_message=None, # auto-built if omitted
conversation_history=None, # auto-loaded from session if omitted
task_id="task_abc123"
)
chat() is a thin wrapper around run_conversation() that extracts the final_response field from the result dict.
API Modes
Hermes supports three API execution modes, resolved from provider selection, explicit args, and base URL heuristics:
| API mode | Used for | Client type |
|---|---|---|
chat_completions |
OpenAI-compatible endpoints (OpenRouter, custom, most providers) | openai.OpenAI |
codex_responses |
OpenAI Codex / Responses API | openai.OpenAI with Responses format |
anthropic_messages |
Native Anthropic Messages API | anthropic.Anthropic via adapter |
The mode determines how messages are formatted, how tool calls are structured, how responses are parsed, and how caching/streaming works. All three converge on the same internal message format (OpenAI-style role/content/tool_calls dicts) before and after API calls.
Mode resolution order:
- Explicit
api_modeconstructor arg (highest priority) - Provider-specific detection (e.g.,
anthropicprovider →anthropic_messages) - Base URL heuristics (e.g.,
api.anthropic.com→anthropic_messages) - Default:
chat_completions
Turn Lifecycle
Each iteration of the agent loop follows this sequence:
run_conversation()
1. Generate task_id if not provided
2. Append user message to conversation history
3. Build or reuse cached system prompt (prompt_builder.py)
4. Check if preflight compression is needed (>50% context)
5. Build API messages from conversation history
- chat_completions: OpenAI format as-is
- codex_responses: convert to Responses API input items
- anthropic_messages: convert via anthropic_adapter.py
6. Inject ephemeral prompt layers (budget warnings, context pressure)
7. Apply prompt caching markers if on Anthropic
8. Make interruptible API call (_interruptible_api_call)
9. Parse response:
- If tool_calls: execute them, append results, loop back to step 5
- If text response: persist session, flush memory if needed, return
Message Format
All messages use OpenAI-compatible format internally:
{"role": "system", "content": "..."}
{"role": "user", "content": "..."}
{"role": "assistant", "content": "...", "tool_calls": [...]}
{"role": "tool", "tool_call_id": "...", "content": "..."}
Reasoning content (from models that support extended thinking) is stored in assistant_msg["reasoning"] and optionally displayed via the reasoning_callback.
Message Alternation Rules
The agent loop enforces strict message role alternation:
- After the system message:
User → Assistant → User → Assistant → ... - During tool calling:
Assistant (with tool_calls) → Tool → Tool → ... → Assistant - Never two assistant messages in a row
- Never two user messages in a row
- Only
toolrole can have consecutive entries (parallel tool results)
Providers validate these sequences and will reject malformed histories.
Interruptible API Calls
API requests are wrapped in _interruptible_api_call() which runs the actual HTTP call in a background thread while monitoring an interrupt event:
┌────────────────────────────────────────────────────┐
│ Main thread API thread │
│ │
│ wait on: HTTP POST │
│ - response ready ───▶ to provider │
│ - interrupt event │
│ - timeout │
└────────────────────────────────────────────────────┘
When interrupted (user sends new message, /stop command, or signal):
- The API thread is abandoned (response discarded)
- The agent can process the new input or shut down cleanly
- No partial response is injected into conversation history
Tool Execution
Sequential vs Concurrent
When the model returns tool calls:
- Single tool call → executed directly in the main thread
- Multiple tool calls → executed concurrently via
ThreadPoolExecutor- Exception: tools marked as interactive (e.g.,
clarify) force sequential execution - Results are reinserted in the original tool call order regardless of completion order
- Exception: tools marked as interactive (e.g.,
Execution Flow
for each tool_call in response.tool_calls:
1. Resolve handler from tools/registry.py
2. Fire pre_tool_call plugin hook
3. Check if dangerous command (tools/approval.py)
- If dangerous: invoke approval_callback, wait for user
4. Execute handler with args + task_id
5. Fire post_tool_call plugin hook
6. Append {"role": "tool", "content": result} to history
Agent-Level Tools
Some tools are intercepted by run_agent.py before reaching handle_function_call():
| Tool | Why intercepted |
|---|---|
todo |
Reads/writes agent-local task state |
memory |
Writes to persistent memory files with character limits |
session_search |
Queries session history via the agent's session DB |
delegate_task |
Spawns subagent(s) with isolated context |
These tools modify agent state directly and return synthetic tool results without going through the registry.
Callback Surfaces
AIAgent supports platform-specific callbacks that enable real-time progress in the CLI, gateway, and ACP integrations:
| Callback | When fired | Used by |
|---|---|---|
tool_progress_callback |
Before/after each tool execution | CLI spinner, gateway progress messages |
thinking_callback |
When model starts/stops thinking | CLI "thinking..." indicator |
reasoning_callback |
When model returns reasoning content | CLI reasoning display, gateway reasoning blocks |
clarify_callback |
When clarify tool is called |
CLI input prompt, gateway interactive message |
step_callback |
After each complete agent turn | Gateway step tracking, ACP progress |
stream_delta_callback |
Each streaming token (when enabled) | CLI streaming display |
tool_gen_callback |
When tool call is parsed from stream | CLI tool preview in spinner |
status_callback |
State changes (thinking, executing, etc.) | ACP status updates |
Budget and Fallback Behavior
Iteration Budget
The agent tracks iterations via IterationBudget:
- Default: 90 iterations (configurable via
agent.max_turns) - Each agent gets its own budget. Subagents get independent budgets capped at
delegation.max_iterations(default 50) — total iterations across parent + subagents can exceed the parent's cap - At 100%, the agent stops and returns a summary of work done
Fallback Model
When the primary model fails (429 rate limit, 5xx server error, 401/403 auth error):
- Check
fallback_providerslist in config - Try each fallback in order
- On success, continue the conversation with the new provider
- On 401/403, attempt credential refresh before failing over
The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the auxiliary.* config section.
Compression and Persistence
When Compression Triggers
- Preflight (before API call): If conversation exceeds 50% of model's context window
- Gateway auto-compression: If conversation exceeds 85% (more aggressive, runs between turns)
What Happens During Compression
- Memory is flushed to disk first (preventing data loss)
- Middle conversation turns are summarized into a compact summary
- The last N messages are preserved intact (
compression.protect_last_n, default: 20) - Tool call/result message pairs are kept together (never split)
- A new session lineage ID is generated (compression creates a "child" session)
Session Persistence
After each turn:
- Messages are saved to the session store (SQLite via
hermes_state.py) - Memory changes are flushed to
MEMORY.md/USER.md - The session can be resumed later via
/resumeorhermes chat --resume
Key Source Files
| File | Purpose |
|---|---|
run_agent.py |
AIAgent class — the complete agent loop |
agent/prompt_builder.py |
System prompt assembly from memory, skills, context files, personality |
agent/context_engine.py |
ContextEngine ABC — pluggable context management |
agent/context_compressor.py |
Default engine — lossy summarization algorithm |
agent/prompt_caching.py |
Anthropic prompt caching markers and cache metrics |
agent/auxiliary_client.py |
Auxiliary LLM client for side tasks (vision, summarization) |
model_tools.py |
Tool schema collection, handle_function_call() dispatch |