mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-04 02:21:47 +00:00
- environments/gsm8k_agent_env.py: Math reasoning with Python REPL tool - Subclasses HermesAgentBaseEnv (proper tools= parameter, not ICL) - Uses ATROPOS_SERVER_* env vars from .env - Hermes tool call parser, configurable per model - Math verification via math_verify with string fallback - Tested: process mode works, both trajectories scored 1.0 - Updated memory bank with consolidation plan: - environments/ is the canonical env system (proper tool calling) - atropos/backends/ kept as sandbox infrastructure - atropos/agent/ and atropos/envs/agent_env.py marked for removal
99 lines
5 KiB
Markdown
99 lines
5 KiB
Markdown
# Active Context
|
|
|
|
## Current Focus
|
|
Consolidating the two Atropos environment systems and fixing tool calling to use proper OpenAI-spec approach instead of ICL.
|
|
|
|
## PR Feedback from Lead Dev (Feb 10, 2026)
|
|
|
|
The PR was rejected because our approach has three fundamental issues:
|
|
|
|
### Issue 1: ManagedServer doesn't pass `tools={}` to `apply_chat_template()`
|
|
- When using Phase 2 (VLLM/SGLang for RL training), `ManagedServer` needs to pass tools to `tokenizer.apply_chat_template(tools=...)`
|
|
- This makes the system prompt include tool definitions the way models were trained to expect
|
|
- **Fix**: Atropos PR #366 adds `tool_call_parser` support to ManagedServer (branch: `tool_call_support`)
|
|
|
|
### Issue 2: ICL prompt vs proper tool calling
|
|
- Our code embeds tools as XML in the system prompt (`<tools>...</tools>`)
|
|
- Proper approach: pass `tools=` parameter in `chat_completion()` calls and let the tokenizer's chat template handle formatting
|
|
- All Hermes datasets train on the proper format, not ICL
|
|
|
|
### Issue 3: Only Hermes `<tool_call>` parser, no multi-model support
|
|
- Our code only handles Hermes-style `<tool_call>` XML parsing
|
|
- Proper approach: parser registry supporting 11+ model families (hermes, qwen, deepseek, llama, mistral, etc.)
|
|
|
|
## Architecture: What Exists Now (Two Parallel Systems)
|
|
|
|
### `environments/` (Teknium's proper approach) ✅ CORRECT
|
|
```
|
|
environments/
|
|
├── agent_loop.py ← Uses tools= in chat_completion() (OpenAI spec)
|
|
├── hermes_base_env.py ← Phase 1 (OpenAI) + Phase 2 (ManagedServer + parser)
|
|
├── tool_context.py ← ToolContext for reward functions
|
|
├── tool_call_parsers/ ← 11 model parsers (hermes, qwen, deepseek, llama, etc.)
|
|
│ ├── __init__.py ← Registry with get_parser(), register_parser()
|
|
│ ├── hermes_parser.py
|
|
│ ├── qwen_parser.py
|
|
│ ├── deepseek_v3_parser.py
|
|
│ ├── llama_parser.py
|
|
│ ├── mistral_parser.py
|
|
│ └── ... (11 total)
|
|
├── terminal_test_env.py ← Working example: file creation tasks
|
|
├── hermes_swe_env.py ← SWE environment
|
|
└── patches.py ← Async-safe monkey patches
|
|
```
|
|
|
|
**How it works correctly:**
|
|
1. `HermesAgentLoop.run()` passes `tools=self.tool_schemas` to `chat_completion()`
|
|
2. ManagedServer passes tools to `tokenizer.apply_chat_template(tools=...)`
|
|
3. Parser registry reconstructs `tool_calls` from raw model output
|
|
4. Tool execution uses hermes-agent's `handle_function_call()` from `model_tools.py`
|
|
|
|
### `atropos/` (Our sandbox-optimized code) - PARTIALLY REDUNDANT
|
|
```
|
|
atropos/
|
|
├── agent/atropos_agent.py ← ICL-based agent (REDUNDANT with agent_loop.py)
|
|
├── envs/agent_env.py ← Environment with sandbox backends (PARTIALLY REDUNDANT)
|
|
├── envs/swe_smith_oracle_env.py ← SWE env using sandbox (KEEP - port to new base)
|
|
├── backends/ ← Sandbox backends (KEEP - valuable infrastructure)
|
|
│ ├── modal_backend.py ← Modal sandbox pool
|
|
│ ├── nomad_backend.py ← Nomad/Docker/Singularity
|
|
│ └── base.py ← ToolBackend protocol
|
|
├── slots/ ← Slot multiplexing (KEEP)
|
|
├── nomad/ ← Nomad client (KEEP)
|
|
├── tools/ ← Sandbox tool registry (PARTIALLY REDUNDANT)
|
|
└── sandbox_server.py ← HTTP server in containers (KEEP)
|
|
```
|
|
|
|
## Plan: Consolidate into `environments/`
|
|
|
|
### What to KEEP from `atropos/`:
|
|
- `backends/` - Modal, Nomad, Singularity backends (valuable infrastructure for scale)
|
|
- `slots/` - Slot multiplexing
|
|
- `nomad/` - Nomad client
|
|
- `sandbox_server.py` - Container HTTP server
|
|
- `Dockerfile` - Sandbox container image
|
|
|
|
### What to REMOVE/REPLACE:
|
|
- `atropos/agent/atropos_agent.py` → replaced by `environments/agent_loop.py`
|
|
- `atropos/envs/agent_env.py` → functionality merged into `environments/hermes_base_env.py`
|
|
- `atropos/tools/` → replaced by `model_tools.py` + `tools/` (hermes-agent's standard tools)
|
|
|
|
### What to CREATE:
|
|
- `environments/gsm8k_agent_env.py` → GSM8k with tool calling, subclasses `HermesAgentBaseEnv`
|
|
- Update `environments/hermes_base_env.py` to optionally use sandbox backends (Nomad/Modal) for terminal isolation when needed for scale
|
|
|
|
### Steps:
|
|
1. Install atropos `tool_call_support` branch (PR #366)
|
|
2. Create `environments/gsm8k_agent_env.py` using `HermesAgentBaseEnv`
|
|
3. Port `swe_smith_oracle_env.py` to use `HermesAgentBaseEnv`
|
|
4. Make sandbox backends accessible from `HermesAgentBaseEnv` (terminal_backend config)
|
|
5. Remove redundant `atropos/agent/` and `atropos/envs/agent_env.py`
|
|
6. Clean up `atropos/tools/` (keep only sandbox-specific tools)
|
|
7. Update tinker-atropos gsm8k env to use proper base class
|
|
8. Test everything end-to-end
|
|
|
|
## Previous Completed Work
|
|
- Modal backend integration (Feb 8) - KEEP backends, update integration point
|
|
- Main branch merge (Feb 9) - completed
|
|
- Singularity/Apptainer (Feb 6) - KEEP
|
|
- Memory Bank initialized (Feb 5)
|