mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-04 02:21:47 +00:00
- environments/gsm8k_agent_env.py: Math reasoning with Python REPL tool - Subclasses HermesAgentBaseEnv (proper tools= parameter, not ICL) - Uses ATROPOS_SERVER_* env vars from .env - Hermes tool call parser, configurable per model - Math verification via math_verify with string fallback - Tested: process mode works, both trajectories scored 1.0 - Updated memory bank with consolidation plan: - environments/ is the canonical env system (proper tool calling) - atropos/backends/ kept as sandbox infrastructure - atropos/agent/ and atropos/envs/agent_env.py marked for removal
5 KiB
5 KiB
Active Context
Current Focus
Consolidating the two Atropos environment systems and fixing tool calling to use proper OpenAI-spec approach instead of ICL.
PR Feedback from Lead Dev (Feb 10, 2026)
The PR was rejected because our approach has three fundamental issues:
Issue 1: ManagedServer doesn't pass tools={} to apply_chat_template()
- When using Phase 2 (VLLM/SGLang for RL training),
ManagedServerneeds to pass tools totokenizer.apply_chat_template(tools=...) - This makes the system prompt include tool definitions the way models were trained to expect
- Fix: Atropos PR #366 adds
tool_call_parsersupport to ManagedServer (branch:tool_call_support)
Issue 2: ICL prompt vs proper tool calling
- Our code embeds tools as XML in the system prompt (
<tools>...</tools>) - Proper approach: pass
tools=parameter inchat_completion()calls and let the tokenizer's chat template handle formatting - All Hermes datasets train on the proper format, not ICL
Issue 3: Only Hermes <tool_call> parser, no multi-model support
- Our code only handles Hermes-style
<tool_call>XML parsing - Proper approach: parser registry supporting 11+ model families (hermes, qwen, deepseek, llama, mistral, etc.)
Architecture: What Exists Now (Two Parallel Systems)
environments/ (Teknium's proper approach) ✅ CORRECT
environments/
├── agent_loop.py ← Uses tools= in chat_completion() (OpenAI spec)
├── hermes_base_env.py ← Phase 1 (OpenAI) + Phase 2 (ManagedServer + parser)
├── tool_context.py ← ToolContext for reward functions
├── tool_call_parsers/ ← 11 model parsers (hermes, qwen, deepseek, llama, etc.)
│ ├── __init__.py ← Registry with get_parser(), register_parser()
│ ├── hermes_parser.py
│ ├── qwen_parser.py
│ ├── deepseek_v3_parser.py
│ ├── llama_parser.py
│ ├── mistral_parser.py
│ └── ... (11 total)
├── terminal_test_env.py ← Working example: file creation tasks
├── hermes_swe_env.py ← SWE environment
└── patches.py ← Async-safe monkey patches
How it works correctly:
HermesAgentLoop.run()passestools=self.tool_schemastochat_completion()- ManagedServer passes tools to
tokenizer.apply_chat_template(tools=...) - Parser registry reconstructs
tool_callsfrom raw model output - Tool execution uses hermes-agent's
handle_function_call()frommodel_tools.py
atropos/ (Our sandbox-optimized code) - PARTIALLY REDUNDANT
atropos/
├── agent/atropos_agent.py ← ICL-based agent (REDUNDANT with agent_loop.py)
├── envs/agent_env.py ← Environment with sandbox backends (PARTIALLY REDUNDANT)
├── envs/swe_smith_oracle_env.py ← SWE env using sandbox (KEEP - port to new base)
├── backends/ ← Sandbox backends (KEEP - valuable infrastructure)
│ ├── modal_backend.py ← Modal sandbox pool
│ ├── nomad_backend.py ← Nomad/Docker/Singularity
│ └── base.py ← ToolBackend protocol
├── slots/ ← Slot multiplexing (KEEP)
├── nomad/ ← Nomad client (KEEP)
├── tools/ ← Sandbox tool registry (PARTIALLY REDUNDANT)
└── sandbox_server.py ← HTTP server in containers (KEEP)
Plan: Consolidate into environments/
What to KEEP from atropos/:
backends/- Modal, Nomad, Singularity backends (valuable infrastructure for scale)slots/- Slot multiplexingnomad/- Nomad clientsandbox_server.py- Container HTTP serverDockerfile- Sandbox container image
What to REMOVE/REPLACE:
atropos/agent/atropos_agent.py→ replaced byenvironments/agent_loop.pyatropos/envs/agent_env.py→ functionality merged intoenvironments/hermes_base_env.pyatropos/tools/→ replaced bymodel_tools.py+tools/(hermes-agent's standard tools)
What to CREATE:
environments/gsm8k_agent_env.py→ GSM8k with tool calling, subclassesHermesAgentBaseEnv- Update
environments/hermes_base_env.pyto optionally use sandbox backends (Nomad/Modal) for terminal isolation when needed for scale
Steps:
- Install atropos
tool_call_supportbranch (PR #366) - Create
environments/gsm8k_agent_env.pyusingHermesAgentBaseEnv - Port
swe_smith_oracle_env.pyto useHermesAgentBaseEnv - Make sandbox backends accessible from
HermesAgentBaseEnv(terminal_backend config) - Remove redundant
atropos/agent/andatropos/envs/agent_env.py - Clean up
atropos/tools/(keep only sandbox-specific tools) - Update tinker-atropos gsm8k env to use proper base class
- Test everything end-to-end
Previous Completed Work
- Modal backend integration (Feb 8) - KEEP backends, update integration point
- Main branch merge (Feb 9) - completed
- Singularity/Apptainer (Feb 6) - KEEP
- Memory Bank initialized (Feb 5)