hermes-agent/memory-bank/systemPatterns.md
Shannon Sands 975c849308 Add GSM8k agent env using proper HermesAgentBaseEnv (not ICL)
- environments/gsm8k_agent_env.py: Math reasoning with Python REPL tool
  - Subclasses HermesAgentBaseEnv (proper tools= parameter, not ICL)
  - Uses ATROPOS_SERVER_* env vars from .env
  - Hermes tool call parser, configurable per model
  - Math verification via math_verify with string fallback
  - Tested: process mode works, both trajectories scored 1.0

- Updated memory bank with consolidation plan:
  - environments/ is the canonical env system (proper tool calling)
  - atropos/backends/ kept as sandbox infrastructure
  - atropos/agent/ and atropos/envs/agent_env.py marked for removal
2026-02-10 01:45:07 +00:00

9.5 KiB

System Patterns: Hermes-Agent

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                           CLI (cli.py)                          │
│  - Rich welcome banner with caduceus                            │
│  - prompt_toolkit for input with history                        │
│  - Kawaii-style feedback and personalities                      │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                     AIAgent (run_agent.py)                      │
│  - Conversation loop with tool calling                          │
│  - KawaiiSpinner for animated feedback                          │
│  - Retry logic with exponential backoff                         │
│  - Session logging to logs/ directory                           │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Tool Routing (model_tools.py)                 │
│  - get_tool_definitions() - returns tools for API calls         │
│  - handle_function_call() - dispatches to tool handlers         │
│  - Toolset filtering (enabled/disabled)                         │
└────────────────────────────┬────────────────────────────────────┘
                             │
           ┌─────────────────┼─────────────────┐
           ▼                 ▼                 ▼
    ┌───────────┐     ┌───────────┐     ┌───────────┐
    │ Web Tools │     │ Terminal  │     │ Browser   │
    │ (Firecrawl)│    │ (mini-swe)│     │(agent-brw)│
    └───────────┘     └───────────┘     └───────────┘
           │                 │                 │
           └─────────────────┼─────────────────┘
                             ▼
                    ┌───────────────┐
                    │  Toolsets     │
                    │  (toolsets.py)│
                    │  Composition  │
                    └───────────────┘

Key Design Patterns

1. Toolset Composition Pattern

Toolsets can include other toolsets, allowing flexible composition:

TOOLSETS = {
    "web": {"tools": ["web_search", "web_extract"], "includes": []},
    "debugging": {"tools": ["terminal"], "includes": ["web"]},
    "full_stack": {"tools": [], "includes": ["web", "terminal", "vision", "browser"]}
}

Resolution is recursive with cycle detection.

2. Graceful Degradation Pattern

Each tool module has a check_*_requirements() function:

  • Tools are only loaded if requirements are met
  • Missing API keys disable tools, not crash the system
  • Import errors are caught and tools marked unavailable
try:
    from tools.web_tools import web_search_tool, check_firecrawl_api_key
except ModuleNotFoundError:
    web_search_tool = None
    def check_firecrawl_api_key(): return False

3. Session Isolation Pattern (task_id)

Stateful tools (terminal, browser) use task_id to isolate concurrent sessions:

  • Each batch worker gets unique task_id
  • VMs and browser sessions are tracked per task_id
  • Cleanup functions release resources: cleanup_vm(task_id), cleanup_browser(task_id)

4. Trajectory Format Pattern

Conversations are saved in ShareGPT format for training:

{"from": "system", "value": "System prompt with <tools>...</tools>"}
{"from": "human", "value": "User message"}
{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
{"from": "gpt", "value": "Final response"}

5. Ephemeral System Prompt Pattern

Guide model behavior during data collection without saving to trajectories:

  • ephemeral_system_prompt influences execution
  • Only standard tool-calling system prompt saved to trajectories
  • Keeps training data clean

6. Retry with Validation Pattern

The agent validates responses before accepting:

  • Check tool names against valid_tool_names set
  • Validate JSON arguments can be parsed
  • Check for content after <think> blocks
  • Roll back to last valid state on persistent failures

Component Relationships

AIAgent Class

  • Central orchestrator for conversations
  • Manages conversation history
  • Calls OpenAI-compatible API
  • Routes tool calls to handlers
  • Provides animated feedback (KawaiiSpinner)

Tool Modules (tools/*.py)

  • Self-contained tool implementations
  • Export: handler function + check function + schema
  • Return JSON strings (never raw dicts)
  • Accept optional task_id for stateful tools

Toolsets System (toolsets.py)

  • Defines logical groupings of tools
  • Supports composition via includes
  • resolve_toolset() recursively resolves all tools
  • validate_toolset() checks if name is valid

Model Tools (model_tools.py)

  • Aggregates all tool definitions
  • Routes function calls to correct handlers
  • Filters tools based on enabled/disabled toolsets
  • Bridge between agent and tool implementations

Critical Implementation Paths

Tool Execution Flow

  1. AIAgent receives tool_calls from API response
  2. Validates tool names against valid_tool_names
  3. Validates JSON arguments can be parsed
  4. Calls handle_function_call() with tool name, args, task_id
  5. handle_function_call() routes to appropriate handler
  6. Tool executes, returns JSON string
  7. Result added to conversation as tool message
  8. Loop continues until natural language response

Configuration Loading Flow

  1. cli.py calls load_cli_config()
  2. Loads cli-config.yaml, merges with defaults
  3. Sets environment variables for terminal config
  4. AIAgent reads env vars when initializing terminal tool
  5. Terminal tool creates appropriate backend based on TERMINAL_ENV

RL Training Architecture (Consolidated)

Environment System (environments/)

The canonical way to build agentic RL environments in Hermes-Agent:

environments/
├── agent_loop.py              ← HermesAgentLoop: OpenAI-spec tool calling
├── hermes_base_env.py         ← HermesAgentBaseEnv: base class for all envs
├── tool_context.py            ← ToolContext: reward function tool access
├── tool_call_parsers/         ← 11+ model parsers (hermes, qwen, deepseek, etc.)
├── terminal_test_env.py       ← Example: file creation tasks
├── hermes_swe_env.py          ← SWE environment
└── gsm8k_agent_env.py         ← GSM8k with Python REPL (TODO)

Two-Phase Operation

  • Phase 1 (OpenAI server): Native tool_calls from VLLM/SGLang/OpenRouter
    • Good for: SFT data gen, testing, evaluation
  • Phase 2 (ManagedServer): Client-side tool call parser + logprob tracking
    • Required for: RL training
    • Parser registry selects per-model parser (hermes, qwen, llama, etc.)

Key Design: Proper Tool Calling (NOT ICL)

# CORRECT: pass tools= to chat_completion()
response = await server.chat_completion(
    messages=messages,
    tools=tool_schemas,  # ← tokenizer.apply_chat_template(tools=...) formats these
    temperature=1.0,
)
# Response has response.choices[0].message.tool_calls (structured objects)

# WRONG (old approach): embed tools in system prompt as XML
system_prompt = f"<tools>{json.dumps(tools)}</tools>"  # ← ICL, not proper training format

Sandbox Backends (atropos/backends/)

Infrastructure for scaled sandbox execution (separate from the env system):

ToolBackend (Protocol)
    ├── NomadToolBackend → SlotPool → NomadClient + SandboxExecutor (HTTP)
    │   ├── Docker driver (default)
    │   └── Singularity driver (HPC)
    └── ModalToolBackend → _ModalSandboxPool → modal.Sandbox.exec() (direct)
        └── _ModalMultiProfileManager (multi-profile support)

Accessed via HermesAgentBaseEnv.terminal_backend config option:

  • local - Direct execution (default, development)
  • docker - Docker containers
  • modal - Modal cloud sandboxes (production RL)
  • singularity - HPC clusters
  • ssh - Remote server

Training Pipeline (Tinker + Atropos)

Terminal 1: run-api (port 8000)              ← Atropos Rollout API
Terminal 2: launch_training.py (port 8001)   ← Tinker Trainer + inference
Terminal 3: environment.py serve             ← Environment (rollouts)