12 KiB
System Patterns: Hermes-Agent
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ CLI (cli.py) │
│ - Rich welcome banner with caduceus │
│ - prompt_toolkit for input with history │
│ - Kawaii-style feedback and personalities │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AIAgent (run_agent.py) │
│ - Conversation loop with tool calling │
│ - KawaiiSpinner for animated feedback │
│ - Retry logic with exponential backoff │
│ - Session logging to logs/ directory │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Tool Routing (model_tools.py) │
│ - get_tool_definitions() - returns tools for API calls │
│ - handle_function_call() - dispatches to tool handlers │
│ - Toolset filtering (enabled/disabled) │
└────────────────────────────┬────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Web Tools │ │ Terminal │ │ Browser │
│ (Firecrawl)│ │ (mini-swe)│ │(agent-brw)│
└───────────┘ └───────────┘ └───────────┘
│ │ │
└─────────────────┼─────────────────┘
▼
┌───────────────┐
│ Toolsets │
│ (toolsets.py)│
│ Composition │
└───────────────┘
Key Design Patterns
1. Toolset Composition Pattern
Toolsets can include other toolsets, allowing flexible composition:
TOOLSETS = {
"web": {"tools": ["web_search", "web_extract"], "includes": []},
"debugging": {"tools": ["terminal"], "includes": ["web"]},
"full_stack": {"tools": [], "includes": ["web", "terminal", "vision", "browser"]}
}
Resolution is recursive with cycle detection.
2. Graceful Degradation Pattern
Each tool module has a check_*_requirements() function:
- Tools are only loaded if requirements are met
- Missing API keys disable tools, not crash the system
- Import errors are caught and tools marked unavailable
try:
from tools.web_tools import web_search_tool, check_firecrawl_api_key
except ModuleNotFoundError:
web_search_tool = None
def check_firecrawl_api_key(): return False
3. Session Isolation Pattern (task_id)
Stateful tools (terminal, browser) use task_id to isolate concurrent sessions:
- Each batch worker gets unique task_id
- VMs and browser sessions are tracked per task_id
- Cleanup functions release resources:
cleanup_vm(task_id),cleanup_browser(task_id)
4. Trajectory Format Pattern
Conversations are saved in ShareGPT format for training:
{"from": "system", "value": "System prompt with <tools>...</tools>"}
{"from": "human", "value": "User message"}
{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
{"from": "gpt", "value": "Final response"}
5. Ephemeral System Prompt Pattern
Guide model behavior during data collection without saving to trajectories:
ephemeral_system_promptinfluences execution- Only standard tool-calling system prompt saved to trajectories
- Keeps training data clean
6. Retry with Validation Pattern
The agent validates responses before accepting:
- Check tool names against
valid_tool_namesset - Validate JSON arguments can be parsed
- Check for content after
<think>blocks - Roll back to last valid state on persistent failures
Component Relationships
AIAgent Class
- Central orchestrator for conversations
- Manages conversation history
- Calls OpenAI-compatible API
- Routes tool calls to handlers
- Provides animated feedback (KawaiiSpinner)
Tool Modules (tools/*.py)
- Self-contained tool implementations
- Export: handler function + check function + schema
- Return JSON strings (never raw dicts)
- Accept optional
task_idfor stateful tools
Toolsets System (toolsets.py)
- Defines logical groupings of tools
- Supports composition via
includes resolve_toolset()recursively resolves all toolsvalidate_toolset()checks if name is valid
Model Tools (model_tools.py)
- Aggregates all tool definitions
- Routes function calls to correct handlers
- Filters tools based on enabled/disabled toolsets
- Bridge between agent and tool implementations
Critical Implementation Paths
Tool Execution Flow
- AIAgent receives tool_calls from API response
- Validates tool names against
valid_tool_names - Validates JSON arguments can be parsed
- Calls
handle_function_call()with tool name, args, task_id handle_function_call()routes to appropriate handler- Tool executes, returns JSON string
- Result added to conversation as tool message
- Loop continues until natural language response
Configuration Loading Flow
cli.pycallsload_cli_config()- Loads
cli-config.yaml, merges with defaults - Sets environment variables for terminal config
AIAgentreads env vars when initializing terminal tool- Terminal tool creates appropriate backend based on
TERMINAL_ENV
RL Training Architecture (Consolidated)
Environment System (environments/)
The canonical way to build agentic RL environments in Hermes-Agent:
environments/
├── agent_loop.py ← HermesAgentLoop: OpenAI-spec tool calling
├── hermes_base_env.py ← HermesAgentBaseEnv: base class for all envs
├── tool_context.py ← ToolContext: reward function tool access
├── tool_call_parsers/ ← 11+ model parsers (hermes, qwen, deepseek, etc.)
├── terminal_test_env.py ← Example: file creation tasks
├── hermes_swe_env.py ← SWE environment
└── gsm8k_agent_env.py ← GSM8k with Python REPL (TODO)
Two-Phase Operation
- Phase 1 (OpenAI server): Native tool_calls from VLLM/SGLang/OpenRouter
- Good for: SFT data gen, testing, evaluation
- Server handles tool call parsing via
/v1/chat/completions
- Phase 2 (ManagedServer): Client-side tool call parser + logprob tracking
- Required for: RL training (exact token IDs + logprobs for GRPO/PPO)
- Uses
/generateendpoint for raw token output - Parser registry selects per-model parser (hermes, qwen, llama, etc.)
- Verified working with RunPod SGLang endpoint (Feb 10, 2026)
Phase 2 Call Chain (Verified)
collect_trajectory()
→ ServerManager.managed_server(tokenizer, tool_call_parser)
→ ManagedServer(server=VLLMServer)
→ ManagedServer.chat_completion(messages, tools, n, max_tokens, temp)
→ _convert_messages_to_prompt(messages, tools=tools) [apply_chat_template]
→ _compute_input_ids(prompt, extending_node)
→ VLLMServer.tokens_and_logprobs_completion(**kwargs) [public method]
→ _tokens_and_logprobs_comp(stat_dict, **kwargs) [retry decorator, semaphore]
→ _tokens_and_logprobs_completion_wrapper(**kwargs) [patched for SGLang]
→ aiohttp POST to /generate
→ Returns (prompt_tokens, [output_tokens], [output_logprobs], [finish_reasons])
→ _create_sequence_node(...) [stores in current_nodes]
→ tool_call_parser.parse(completion_text) [if parser configured]
→ Returns ChatCompletion with tool_calls
SGLang Compatibility Patch (environments/patches.py)
VLLMServer's _tokens_and_logprobs_completion_wrapper is monkey-patched to handle SGLang's
different request/response format. Applied automatically at import time via apply_patches().
SGLang request: {"input_ids": [...], "sampling_params": {...}, "return_logprob": true}
SGLang response: {"meta_info": {"output_token_logprobs": [[logprob, token_id, text], ...]}}
VLLM request: {"prompt": {"prompt_token_ids": [...]}, "logprobs": 0}
VLLM response: {"logprobs": [[{token_id: logprob}]], "finish_reasons": [...]}
Also handles RunPod serverless double-JSON wrapping (response body wrapped in quotes).
Key Design: Proper Tool Calling (NOT ICL)
# CORRECT: pass tools= to chat_completion()
response = await server.chat_completion(
messages=messages,
tools=tool_schemas, # ← tokenizer.apply_chat_template(tools=...) formats these
temperature=1.0,
)
# Response has response.choices[0].message.tool_calls (structured objects)
# WRONG (old approach): embed tools in system prompt as XML
system_prompt = f"<tools>{json.dumps(tools)}</tools>" # ← ICL, not proper training format
Sandbox Backends (atropos/backends/)
Infrastructure for scaled sandbox execution, integrated into HermesAgentBaseEnv:
ToolBackend (Protocol)
├── NomadToolBackend → SlotPool → NomadClient + SandboxExecutor (HTTP)
│ ├── Docker driver (default)
│ └── Singularity driver (HPC)
└── ModalToolBackend → _ModalSandboxPool → modal.Sandbox.exec() (direct)
└── _ModalMultiProfileManager (multi-profile support)
Two execution modes in HermesAgentBaseEnv (controlled by tool_pool_mode config):
default- Local tool execution via handle_function_call() + ToolContextmodal/nomad- Sandbox routing: slot acquire → setup workspace → agent loop → verify → release
Sandbox routing architecture:
collect_trajectory()
├── tool_pool_mode="default" → _collect_trajectory_local()
│ └── _run_agent_loop(tool_handler=None) → compute_reward(ctx)
│
└── tool_pool_mode="modal"/"nomad" → _collect_trajectory_sandbox()
├── backend.acquire(task_id) → Slot
├── exec_tool = backend.execute_batch wrapper → ExecutionResult
├── setup_trajectory_workspace(item, exec_tool) [subclass hook]
├── _run_agent_loop(tool_handler=sandbox_tool_handler)
│ └── terminal → backend.execute_batch → JSON string
│ └── other tools → handle_function_call (local)
├── verify_and_score_trajectory(item, result, exec_tool) [subclass hook]
└── backend.release(slot, reset_workspace=True) [finally]
Key interfaces:
exec_tool(tool_name, args, timeout)→ExecutionResult(for env hooks)tool_handler(tool_name, args, task_id)→ JSON string (for agent loop)
Training Pipeline (Tinker + Atropos)
Terminal 1: run-api (port 8000) ← Atropos Rollout API
Terminal 2: launch_training.py (port 8001) ← Tinker Trainer + inference
Terminal 3: environment.py serve ← Environment (rollouts)