Active Context

Goal

Run this command:

python environments/swe_smith_oracle_env.py process \
    --env.use_wandb false \
    --env.total_steps 2 \
    --env.group_size 1 \
    --env.max_items 2 \
    --env.tool_pool_mode modal \
    --env.modal_image python:3.11 \
    --env.modal_slots_per_sandbox 10 \
    --env.modal_min_sandboxes 1

What's Done

✅ agent_loop.py - Added tool_handler parameter
- New param: tool_handler=None in __init__
- When self.tool_handler is set, it's called INSTEAD of handle_function_call()
- Signature: async tool_handler(tool_name, args, task_id) -> str
- Shows [sandbox] instead of backend name in terminal preview
✅ Phase 2 ManagedServer + SGLang - Fully working (previous session)
✅ hermes_base_env.py - Sandbox routing in collect_trajectory() (THIS SESSION)
- Refactored collect_trajectory() into:
  - _use_sandbox_backend() - checks if sandbox should be used
  - _collect_trajectory_local() - existing path (ToolContext + handle_function_call)
  - _collect_trajectory_sandbox() - NEW sandbox path with slot lifecycle
  - _run_agent_loop() - shared agent loop for Phase 1/2, accepts tool_handler
  - _build_scored_item() - shared scored item construction
- Sandbox path:
  1. backend.acquire(task_id) → Slot
  2. exec_tool callable wrapping backend.execute_batch([(slot, tool_name, args)])
  3. setup_trajectory_workspace(item, exec_tool=exec_tool) → workspace_meta
  4. sandbox_tool_handler routes terminal→sandbox, other→local
  5. _run_agent_loop(tool_handler=sandbox_tool_handler)
  6. verify_and_score_trajectory(item, result, exec_tool=exec_tool)
  7. backend.release(slot, reset_workspace=True) in finally
- Added handle_function_call import for non-terminal tool fallback
✅ swe_smith_oracle_env.py - Sandbox hooks (THIS SESSION)
- setup_trajectory_workspace() - bare repo cache + git worktree (ported from atropos/envs/swe_smith_oracle_env.py)
- verify_and_score_trajectory() - install deps + run pytest in sandbox
- compute_reward() retained for local (non-sandbox) path
- Uses exec_tool("bash", {"command": cmd}, timeout=600) → ExecutionResult
✅ All tests pass:
- Syntax checks (ast.parse) on both files
- Import checks (both modules import cleanly)
- Method existence checks (all new methods present)
- Signature checks (exec_tool, trajectory_id, workspace_meta params)
- Backend integration (ModalSandboxConfig.from_agent_env_config, create_tool_backend)
- _use_sandbox_backend() logic (True when modal+backend set, False otherwise)
✅ End-to-end test with Qwen 3 8B + Modal sandbox (THIS SESSION)
- RunPod endpoint: 0tx0ruuuo4f10c (Qwen/Qwen3-8B via SGLang)
- 5 terminal tool calls executed IN sandbox: ls, git status, git log, cat parse.py, cat tests/
- In-sandbox verification: install deps + pytest → score=0.0 (model inspected but didn't fix)
- Full token tracking with logprobs via Phase 2 ManagedServer
- Key finding: Llama-3-8B template silently drops tools= param, Qwen 3 has full Hermes format support

Current Task: Integrate Slot Pool Backend into tools/terminal_tool.py

Step 1: Add `_SlotPoolEnvironment` to `tools/terminal_tool.py`

New class alongside existing _LocalEnvironment, _DockerEnvironment, etc.
Routes through atropos/backends/ (ModalToolBackend or NomadToolBackend)
N:M slot multiplexing: 5-10 sandboxes × 10 slots each = 50-100 concurrent
Singleton _SlotPoolManager (like _ModalPoolManager) manages backend lifecycle
execute() acquires slot → backend.execute_batch([(slot, "bash", ...)]) → returns {"output": ..., "returncode": ...}
cleanup() releases slot back to pool

Step 2: Wire into `_create_environment()`

TERMINAL_ENV=slot_pool → _SlotPoolEnvironment(...)
Sub-config: TERMINAL_SLOT_BACKEND=modal or TERMINAL_SLOT_BACKEND=nomad
Reuse existing TERMINAL_MODAL_* and Nomad env vars for configuration

Step 3: Remove redundant `atropos/tools/` files

DELETE: hermes_external_tools.py, build_registry.py, sandbox_stubs.py, toolset_resolver.py
KEEP: base.py (ToolCall/ToolResult types), tool_executor.py (batched queue), terminal_stateful_tool.py, tmux_tool.py

Step 4: Clean up `atropos/envs/` and `atropos/agent/` (defer)

Remove atropos/envs/agent_env.py → replaced by environments/hermes_base_env.py
Remove atropos/agent/atropos_agent.py → replaced by environments/agent_loop.py

Later

Test with Tinker trainer (blocked on billing)
Add more environments (endless-terminals, terminalbench 2)

Key Architecture Insight

Two separate sandbox integration points:

tools/terminal_tool.py with TERMINAL_ENV=slot_pool — for hermes CLI, batch_runner, any code using handle_function_call("terminal", ...). Uses _SlotPoolEnvironment which wraps atropos/backends/.
environments/hermes_base_env.py with tool_pool_mode=modal/nomad — for RL environments. Uses _collect_trajectory_sandbox() which directly acquires slots and creates sandbox_tool_handler.

Both use the same underlying atropos/backends/ (ModalToolBackend, NomadToolBackend) with the same slot pool.

Architecture Summary

environments/hermes_base_env.py  (HermesAgentBaseEnv)
    │
    ├── tool_pool_mode="default" (existing path)
    │   └── collect_trajectory() → HermesAgentLoop(tool_handler=None) 
    │       → handle_function_call() → hermes terminal tool (local)
    │
    └── tool_pool_mode="modal" or "nomad" (new path)
        └── collect_trajectory():
            1. slot = backend.acquire(task_id)
            2. exec_tool = lambda routing through backend.execute_batch
            3. setup_trajectory_workspace(item, exec_tool=exec_tool)  [subclass hook]
            4. HermesAgentLoop(tool_handler=sandbox_tool_handler)
               → terminal calls → backend.execute_batch(slot, "bash", ...)
            5. verify_and_score_trajectory(item, result, exec_tool=exec_tool) [subclass hook]
            6. backend.release(slot, reset_workspace=True)

atropos/backends/modal_backend.py  (ModalToolBackend)
    └── acquire(trajectory_id) → Slot
    └── execute_batch([(slot, "bash", {"command": "..."})])  → [ExecutionResult]
    └── release(slot, reset_workspace=True)

Key Files to Modify

environments/hermes_base_env.py - Add sandbox path in collect_trajectory()
environments/swe_smith_oracle_env.py - Override setup_trajectory_workspace() and verify_and_score_trajectory() to use exec_tool

Important Notes

exec_tool returns ExecutionResult (from atropos/slots/executor.py) with .success, .output, .error, .metadata
tool_handler returns JSON string (for agent loop message format)
These are DIFFERENT interfaces for different purposes:
- exec_tool: used by env hooks (setup/verify) - returns structured result
- tool_handler: used by agent loop - returns JSON string like hermes tools do
The ModalToolBackend.execute_batch calls _ModalSandboxWithSlots.execute which runs sandbox.exec("bash", "-c", command) on Modal
For the SWE env, the worktree setup pattern from atropos/envs/swe_smith_oracle_env.py should be reused (bare repo cache + worktree add)

7.3 KiB Raw Blame History Unescape Escape

Active Context

Current Task: SWE Smith Oracle Env with Modal Backend

Goal

What's Done

Current Task: Integrate Slot Pool Backend into tools/terminal_tool.py

Step 1: Add _SlotPoolEnvironment to tools/terminal_tool.py

Step 2: Wire into _create_environment()

Step 3: Remove redundant atropos/tools/ files

Step 4: Clean up atropos/envs/ and atropos/agent/ (defer)

Later

Key Architecture Insight

Architecture Summary

Key Files to Modify

Important Notes

7.3 KiB

Raw Blame History

Step 1: Add `_SlotPoolEnvironment` to `tools/terminal_tool.py`

Step 2: Wire into `_create_environment()`

Step 3: Remove redundant `atropos/tools/` files

Step 4: Clean up `atropos/envs/` and `atropos/agent/` (defer)