hermes-agent/memory-bank/progress.md
Shannon Sands 975c849308 Add GSM8k agent env using proper HermesAgentBaseEnv (not ICL)
- environments/gsm8k_agent_env.py: Math reasoning with Python REPL tool
  - Subclasses HermesAgentBaseEnv (proper tools= parameter, not ICL)
  - Uses ATROPOS_SERVER_* env vars from .env
  - Hermes tool call parser, configurable per model
  - Math verification via math_verify with string fallback
  - Tested: process mode works, both trajectories scored 1.0

- Updated memory bank with consolidation plan:
  - environments/ is the canonical env system (proper tool calling)
  - atropos/backends/ kept as sandbox infrastructure
  - atropos/agent/ and atropos/envs/agent_env.py marked for removal
2026-02-10 01:45:07 +00:00

85 lines
3.8 KiB
Markdown

# Progress
## Current Sprint: Consolidate Environment Systems (Feb 10, 2026)
PR feedback from lead dev identified three fundamental issues with our approach:
1. Tool calling uses ICL (in-context learning) instead of proper `tools=` parameter
2. ManagedServer doesn't pass tools to `apply_chat_template()`
3. Only Hermes parser, no multi-model support
Teknium already built the correct approach in `environments/` directory. Our task is to consolidate.
### Status
- [ ] Install atropos `tool_call_support` branch (PR #366)
- [ ] Create `environments/gsm8k_agent_env.py` using `HermesAgentBaseEnv`
- [ ] Port SWE env to `HermesAgentBaseEnv`
- [ ] Make sandbox backends accessible from `HermesAgentBaseEnv`
- [ ] Remove redundant `atropos/agent/` and `atropos/envs/agent_env.py`
- [ ] Clean up redundant `atropos/tools/`
- [ ] Test end-to-end with Tinker
## Completed Features
### ✅ Modal Backend Integration (Feb 8, 2026)
- `ModalToolBackend` with slot-based multiplexing
- Multi-profile support (CPU, GPU, high-memory)
- Auto-scaling sandbox pool via Modal Sandboxes
- **Status: KEEP backends, but change integration point from atropos/envs/ to environments/**
### ✅ Main Branch Merge (Feb 9, 2026)
- Merged 22,560 lines, 79 files, 5 conflicts resolved
- New: hermes_cli/, file_operations, RL training tools, gateway, cron
### ✅ Tinker RL Training Setup (Feb 9, 2026)
- tinker 0.12.0 + tinker-atropos installed
- GSM8k agent env created (needs rewrite to use proper base class)
- Config for Qwen3-4B created
- Pipeline verified: Tinker API connection works, all imports pass
- **Blocked on billing** (Tinker 402 error - regional payment issue)
### ✅ Singularity/Apptainer Sandbox (Feb 6, 2026)
- Nomad raw_exec driver for HPC clusters
- All sandbox operations tested and working
### ✅ Memory Bank (Feb 5, 2026)
- Project documentation structure initialized
## What to KEEP vs REMOVE
### KEEP (valuable infrastructure):
| Component | Location | Purpose |
|-----------|----------|---------|
| Modal backend | `atropos/backends/modal_backend.py` | Cloud sandbox pool |
| Nomad backend | `atropos/backends/nomad_backend.py` | Docker/Singularity sandboxes |
| Slot pool | `atropos/slots/` | Container multiplexing |
| Nomad client | `atropos/nomad/` | Nomad API |
| Sandbox server | `atropos/sandbox_server.py` | HTTP server in containers |
| Dockerfile | `atropos/Dockerfile` | Container image |
| Agent loop | `environments/agent_loop.py` | Proper OpenAI-spec tool calling |
| Base env | `environments/hermes_base_env.py` | Phase 1/2 with parsers |
| Tool parsers | `environments/tool_call_parsers/` | 11+ model parsers |
### REMOVE (redundant with environments/):
| Component | Location | Replaced By |
|-----------|----------|-------------|
| ICL agent | `atropos/agent/atropos_agent.py` | `environments/agent_loop.py` |
| AgentEnv | `atropos/envs/agent_env.py` | `environments/hermes_base_env.py` |
| Tool registry | `atropos/tools/` | `model_tools.py` + `tools/` |
| GSM8k ICL env | `tinker-atropos/.../gsm8k_agent.py` | New proper version |
## Known Issues
- Tinker billing (402 error) - user's payment didn't process
- `bwrap_available: false` in Singularity containers
- atropos `tool_call_support` branch not yet installed (PR #366)
## Evolution of Decisions
### Agent Architecture
- **v1 (our branch)**: ICL-based agent with `<tool_call>` XML tags in system prompt
- **v2 (Teknium's)**: Proper OpenAI-spec tool calling with `tools=` parameter
- **Decision**: Adopt v2, consolidate into `environments/`, keep sandbox backends from v1
### Environment Organization
- **Before**: Two parallel systems (`atropos/envs/` and `environments/`)
- **After**: Single system in `environments/`, using `HermesAgentBaseEnv` as base class
- Sandbox backends remain in `atropos/backends/` but integrate via terminal backend config