hermes-agent/memory-bank/activeContext.md
2026-02-10 06:38:19 +00:00

120 lines
6.1 KiB
Markdown

# Active Context
## Current Task: SWE Smith Oracle Env with Modal Backend
### Goal
Run this command:
```bash
python environments/swe_smith_oracle_env.py process \
--env.use_wandb false \
--env.total_steps 2 \
--env.group_size 1 \
--env.max_items 2 \
--env.tool_pool_mode modal \
--env.modal_image python:3.11 \
--env.modal_slots_per_sandbox 10 \
--env.modal_min_sandboxes 1
```
### What's Done
1.**agent_loop.py** - Added `tool_handler` parameter
- New param: `tool_handler=None` in `__init__`
- When `self.tool_handler` is set, it's called INSTEAD of `handle_function_call()`
- Signature: `async tool_handler(tool_name, args, task_id) -> str`
- Shows `[sandbox]` instead of backend name in terminal preview
2.**Phase 2 ManagedServer + SGLang** - Fully working (previous session)
3.**hermes_base_env.py** - Sandbox routing in collect_trajectory() (THIS SESSION)
- Refactored `collect_trajectory()` into:
- `_use_sandbox_backend()` - checks if sandbox should be used
- `_collect_trajectory_local()` - existing path (ToolContext + handle_function_call)
- `_collect_trajectory_sandbox()` - NEW sandbox path with slot lifecycle
- `_run_agent_loop()` - shared agent loop for Phase 1/2, accepts tool_handler
- `_build_scored_item()` - shared scored item construction
- Sandbox path:
1. `backend.acquire(task_id)` → Slot
2. `exec_tool` callable wrapping `backend.execute_batch([(slot, tool_name, args)])`
3. `setup_trajectory_workspace(item, exec_tool=exec_tool)` → workspace_meta
4. `sandbox_tool_handler` routes terminal→sandbox, other→local
5. `_run_agent_loop(tool_handler=sandbox_tool_handler)`
6. `verify_and_score_trajectory(item, result, exec_tool=exec_tool)`
7. `backend.release(slot, reset_workspace=True)` in finally
- Added `handle_function_call` import for non-terminal tool fallback
4.**swe_smith_oracle_env.py** - Sandbox hooks (THIS SESSION)
- `setup_trajectory_workspace()` - bare repo cache + git worktree (ported from atropos/envs/swe_smith_oracle_env.py)
- `verify_and_score_trajectory()` - install deps + run pytest in sandbox
- `compute_reward()` retained for local (non-sandbox) path
- Uses `exec_tool("bash", {"command": cmd}, timeout=600)``ExecutionResult`
5.**All tests pass**:
- Syntax checks (ast.parse) on both files
- Import checks (both modules import cleanly)
- Method existence checks (all new methods present)
- Signature checks (exec_tool, trajectory_id, workspace_meta params)
- Backend integration (ModalSandboxConfig.from_agent_env_config, create_tool_backend)
- `_use_sandbox_backend()` logic (True when modal+backend set, False otherwise)
6.**End-to-end test with Qwen 3 8B + Modal sandbox** (THIS SESSION)
- RunPod endpoint: `0tx0ruuuo4f10c` (Qwen/Qwen3-8B via SGLang)
- 5 terminal tool calls executed IN sandbox: `ls`, `git status`, `git log`, `cat parse.py`, `cat tests/`
- In-sandbox verification: install deps + pytest → score=0.0 (model inspected but didn't fix)
- Full token tracking with logprobs via Phase 2 ManagedServer
- Key finding: Llama-3-8B template silently drops `tools=` param, Qwen 3 has full Hermes format support
### What Still Needs to Be Done
#### 1. Replace hermes-agent tools backend with sandbox backend globally
Per Teknium's feedback: `tools/terminal_tool.py`, `tools/file_tools.py` etc. should be able to use
the Modal/Nomad sandbox backend not just in atropos envs but also in `batch_runner.py` for scaled
data generation. This unifies the tool execution path across CLI, batch, and RL environments.
#### 2. Clean up redundant code
- Remove `atropos/agent/` (replaced by `environments/agent_loop.py`)
- Remove `atropos/envs/agent_env.py` (replaced by `environments/hermes_base_env.py`)
- Remove `atropos/tools/` (use `model_tools.py` + `tools/` directly)
#### 3. Test with Tinker trainer (blocked on billing)
Full RL training loop: Tinker API → atropos rollout API → environment → trainer
#### 4. Add more environments
Teknium mentioned needing "endless-terminals" and "terminalbench 2" envs
### Architecture Summary
```
environments/hermes_base_env.py (HermesAgentBaseEnv)
├── tool_pool_mode="default" (existing path)
│ └── collect_trajectory() → HermesAgentLoop(tool_handler=None)
│ → handle_function_call() → hermes terminal tool (local)
└── tool_pool_mode="modal" or "nomad" (new path)
└── collect_trajectory():
1. slot = backend.acquire(task_id)
2. exec_tool = lambda routing through backend.execute_batch
3. setup_trajectory_workspace(item, exec_tool=exec_tool) [subclass hook]
4. HermesAgentLoop(tool_handler=sandbox_tool_handler)
→ terminal calls → backend.execute_batch(slot, "bash", ...)
5. verify_and_score_trajectory(item, result, exec_tool=exec_tool) [subclass hook]
6. backend.release(slot, reset_workspace=True)
atropos/backends/modal_backend.py (ModalToolBackend)
└── acquire(trajectory_id) → Slot
└── execute_batch([(slot, "bash", {"command": "..."})]) → [ExecutionResult]
└── release(slot, reset_workspace=True)
```
### Key Files to Modify
1. `environments/hermes_base_env.py` - Add sandbox path in `collect_trajectory()`
2. `environments/swe_smith_oracle_env.py` - Override `setup_trajectory_workspace()` and `verify_and_score_trajectory()` to use exec_tool
### Important Notes
- `exec_tool` returns `ExecutionResult` (from `atropos/slots/executor.py`) with `.success`, `.output`, `.error`, `.metadata`
- `tool_handler` returns JSON string (for agent loop message format)
- These are DIFFERENT interfaces for different purposes:
- `exec_tool`: used by env hooks (setup/verify) - returns structured result
- `tool_handler`: used by agent loop - returns JSON string like hermes tools do
- The ModalToolBackend.execute_batch calls _ModalSandboxWithSlots.execute which runs `sandbox.exec("bash", "-c", command)` on Modal
- For the SWE env, the worktree setup pattern from `atropos/envs/swe_smith_oracle_env.py` should be reused (bare repo cache + worktree add)