mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-04 02:21:47 +00:00

Shannon Sands 975c849308 Add GSM8k agent env using proper HermesAgentBaseEnv (not ICL)

- environments/gsm8k_agent_env.py: Math reasoning with Python REPL tool
  - Subclasses HermesAgentBaseEnv (proper tools= parameter, not ICL)
  - Uses ATROPOS_SERVER_* env vars from .env
  - Hermes tool call parser, configurable per model
  - Math verification via math_verify with string fallback
  - Tested: process mode works, both trajectories scored 1.0

- Updated memory bank with consolidation plan:
  - environments/ is the canonical env system (proper tool calling)
  - atropos/backends/ kept as sandbox infrastructure
  - atropos/agent/ and atropos/envs/agent_env.py marked for removal

2026-02-10 01:45:07 +00:00

5 KiB

Raw Blame History

Active Context

Current Focus

Consolidating the two Atropos environment systems and fixing tool calling to use proper OpenAI-spec approach instead of ICL.

PR Feedback from Lead Dev (Feb 10, 2026)

The PR was rejected because our approach has three fundamental issues:

Issue 1: ManagedServer doesn't pass `tools={}` to `apply_chat_template()`

When using Phase 2 (VLLM/SGLang for RL training), ManagedServer needs to pass tools to tokenizer.apply_chat_template(tools=...)
This makes the system prompt include tool definitions the way models were trained to expect
Fix: Atropos PR #366 adds tool_call_parser support to ManagedServer (branch: tool_call_support)

Issue 2: ICL prompt vs proper tool calling

Our code embeds tools as XML in the system prompt (<tools>...</tools>)
Proper approach: pass tools= parameter in chat_completion() calls and let the tokenizer's chat template handle formatting
All Hermes datasets train on the proper format, not ICL

Issue 3: Only Hermes `<tool_call>` parser, no multi-model support

Our code only handles Hermes-style <tool_call> XML parsing
Proper approach: parser registry supporting 11+ model families (hermes, qwen, deepseek, llama, mistral, etc.)

Architecture: What Exists Now (Two Parallel Systems)

`environments/` (Teknium's proper approach) ✅ CORRECT

environments/
├── agent_loop.py              ← Uses tools= in chat_completion() (OpenAI spec)
├── hermes_base_env.py         ← Phase 1 (OpenAI) + Phase 2 (ManagedServer + parser)
├── tool_context.py            ← ToolContext for reward functions
├── tool_call_parsers/         ← 11 model parsers (hermes, qwen, deepseek, llama, etc.)
│   ├── __init__.py            ← Registry with get_parser(), register_parser()
│   ├── hermes_parser.py
│   ├── qwen_parser.py
│   ├── deepseek_v3_parser.py
│   ├── llama_parser.py
│   ├── mistral_parser.py
│   └── ... (11 total)
├── terminal_test_env.py       ← Working example: file creation tasks
├── hermes_swe_env.py          ← SWE environment
└── patches.py                 ← Async-safe monkey patches

How it works correctly:

HermesAgentLoop.run() passes tools=self.tool_schemas to chat_completion()
ManagedServer passes tools to tokenizer.apply_chat_template(tools=...)
Parser registry reconstructs tool_calls from raw model output
Tool execution uses hermes-agent's handle_function_call() from model_tools.py

`atropos/` (Our sandbox-optimized code) - PARTIALLY REDUNDANT

atropos/
├── agent/atropos_agent.py     ← ICL-based agent (REDUNDANT with agent_loop.py)
├── envs/agent_env.py          ← Environment with sandbox backends (PARTIALLY REDUNDANT)
├── envs/swe_smith_oracle_env.py ← SWE env using sandbox (KEEP - port to new base)
├── backends/                  ← Sandbox backends (KEEP - valuable infrastructure)
│   ├── modal_backend.py       ← Modal sandbox pool
│   ├── nomad_backend.py       ← Nomad/Docker/Singularity
│   └── base.py                ← ToolBackend protocol
├── slots/                     ← Slot multiplexing (KEEP)
├── nomad/                     ← Nomad client (KEEP)
├── tools/                     ← Sandbox tool registry (PARTIALLY REDUNDANT)
└── sandbox_server.py          ← HTTP server in containers (KEEP)

Plan: Consolidate into `environments/`

What to KEEP from `atropos/`:

backends/ - Modal, Nomad, Singularity backends (valuable infrastructure for scale)
slots/ - Slot multiplexing
nomad/ - Nomad client
sandbox_server.py - Container HTTP server
Dockerfile - Sandbox container image

What to REMOVE/REPLACE:

atropos/agent/atropos_agent.py → replaced by environments/agent_loop.py
atropos/envs/agent_env.py → functionality merged into environments/hermes_base_env.py
atropos/tools/ → replaced by model_tools.py + tools/ (hermes-agent's standard tools)

What to CREATE:

environments/gsm8k_agent_env.py → GSM8k with tool calling, subclasses HermesAgentBaseEnv
Update environments/hermes_base_env.py to optionally use sandbox backends (Nomad/Modal) for terminal isolation when needed for scale

Steps:

Install atropos tool_call_support branch (PR #366)
Create environments/gsm8k_agent_env.py using HermesAgentBaseEnv
Port swe_smith_oracle_env.py to use HermesAgentBaseEnv
Make sandbox backends accessible from HermesAgentBaseEnv (terminal_backend config)
Remove redundant atropos/agent/ and atropos/envs/agent_env.py
Clean up atropos/tools/ (keep only sandbox-specific tools)
Update tinker-atropos gsm8k env to use proper base class
Test everything end-to-end

Previous Completed Work

Modal backend integration (Feb 8) - KEEP backends, update integration point
Main branch merge (Feb 9) - completed
Singularity/Apptainer (Feb 6) - KEEP
Memory Bank initialized (Feb 5)

5 KiB Raw Blame History