mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-03 02:11:48 +00:00
7.3 KiB
7.3 KiB
Active Context
Current Task: SWE Smith Oracle Env with Modal Backend
Goal
Run this command:
python environments/swe_smith_oracle_env.py process \
--env.use_wandb false \
--env.total_steps 2 \
--env.group_size 1 \
--env.max_items 2 \
--env.tool_pool_mode modal \
--env.modal_image python:3.11 \
--env.modal_slots_per_sandbox 10 \
--env.modal_min_sandboxes 1
What's Done
-
✅ agent_loop.py - Added
tool_handlerparameter- New param:
tool_handler=Nonein__init__ - When
self.tool_handleris set, it's called INSTEAD ofhandle_function_call() - Signature:
async tool_handler(tool_name, args, task_id) -> str - Shows
[sandbox]instead of backend name in terminal preview
- New param:
-
✅ Phase 2 ManagedServer + SGLang - Fully working (previous session)
-
✅ hermes_base_env.py - Sandbox routing in collect_trajectory() (THIS SESSION)
- Refactored
collect_trajectory()into:_use_sandbox_backend()- checks if sandbox should be used_collect_trajectory_local()- existing path (ToolContext + handle_function_call)_collect_trajectory_sandbox()- NEW sandbox path with slot lifecycle_run_agent_loop()- shared agent loop for Phase 1/2, accepts tool_handler_build_scored_item()- shared scored item construction
- Sandbox path:
backend.acquire(task_id)→ Slotexec_toolcallable wrappingbackend.execute_batch([(slot, tool_name, args)])setup_trajectory_workspace(item, exec_tool=exec_tool)→ workspace_metasandbox_tool_handlerroutes terminal→sandbox, other→local_run_agent_loop(tool_handler=sandbox_tool_handler)verify_and_score_trajectory(item, result, exec_tool=exec_tool)backend.release(slot, reset_workspace=True)in finally
- Added
handle_function_callimport for non-terminal tool fallback
- Refactored
-
✅ swe_smith_oracle_env.py - Sandbox hooks (THIS SESSION)
setup_trajectory_workspace()- bare repo cache + git worktree (ported from atropos/envs/swe_smith_oracle_env.py)verify_and_score_trajectory()- install deps + run pytest in sandboxcompute_reward()retained for local (non-sandbox) path- Uses
exec_tool("bash", {"command": cmd}, timeout=600)→ExecutionResult
-
✅ All tests pass:
- Syntax checks (ast.parse) on both files
- Import checks (both modules import cleanly)
- Method existence checks (all new methods present)
- Signature checks (exec_tool, trajectory_id, workspace_meta params)
- Backend integration (ModalSandboxConfig.from_agent_env_config, create_tool_backend)
_use_sandbox_backend()logic (True when modal+backend set, False otherwise)
-
✅ End-to-end test with Qwen 3 8B + Modal sandbox (THIS SESSION)
- RunPod endpoint:
0tx0ruuuo4f10c(Qwen/Qwen3-8B via SGLang) - 5 terminal tool calls executed IN sandbox:
ls,git status,git log,cat parse.py,cat tests/ - In-sandbox verification: install deps + pytest → score=0.0 (model inspected but didn't fix)
- Full token tracking with logprobs via Phase 2 ManagedServer
- Key finding: Llama-3-8B template silently drops
tools=param, Qwen 3 has full Hermes format support
- RunPod endpoint:
Current Task: Integrate Slot Pool Backend into tools/terminal_tool.py
Step 1: Add _SlotPoolEnvironment to tools/terminal_tool.py
- New class alongside existing
_LocalEnvironment,_DockerEnvironment, etc. - Routes through
atropos/backends/(ModalToolBackend or NomadToolBackend) - N:M slot multiplexing: 5-10 sandboxes × 10 slots each = 50-100 concurrent
- Singleton
_SlotPoolManager(like_ModalPoolManager) manages backend lifecycle execute()acquires slot →backend.execute_batch([(slot, "bash", ...)])→ returns{"output": ..., "returncode": ...}cleanup()releases slot back to pool
Step 2: Wire into _create_environment()
TERMINAL_ENV=slot_pool→_SlotPoolEnvironment(...)- Sub-config:
TERMINAL_SLOT_BACKEND=modalorTERMINAL_SLOT_BACKEND=nomad - Reuse existing
TERMINAL_MODAL_*and Nomad env vars for configuration
Step 3: Remove redundant atropos/tools/ files
- DELETE:
hermes_external_tools.py,build_registry.py,sandbox_stubs.py,toolset_resolver.py - KEEP:
base.py(ToolCall/ToolResult types),tool_executor.py(batched queue),terminal_stateful_tool.py,tmux_tool.py
Step 4: Clean up atropos/envs/ and atropos/agent/ (defer)
- Remove
atropos/envs/agent_env.py→ replaced byenvironments/hermes_base_env.py - Remove
atropos/agent/atropos_agent.py→ replaced byenvironments/agent_loop.py
Later
- Test with Tinker trainer (blocked on billing)
- Add more environments (endless-terminals, terminalbench 2)
Key Architecture Insight
Two separate sandbox integration points:
tools/terminal_tool.pywithTERMINAL_ENV=slot_pool— for hermes CLI, batch_runner, any code usinghandle_function_call("terminal", ...). Uses_SlotPoolEnvironmentwhich wrapsatropos/backends/.environments/hermes_base_env.pywithtool_pool_mode=modal/nomad— for RL environments. Uses_collect_trajectory_sandbox()which directly acquires slots and createssandbox_tool_handler.
Both use the same underlying atropos/backends/ (ModalToolBackend, NomadToolBackend) with the same slot pool.
Architecture Summary
environments/hermes_base_env.py (HermesAgentBaseEnv)
│
├── tool_pool_mode="default" (existing path)
│ └── collect_trajectory() → HermesAgentLoop(tool_handler=None)
│ → handle_function_call() → hermes terminal tool (local)
│
└── tool_pool_mode="modal" or "nomad" (new path)
└── collect_trajectory():
1. slot = backend.acquire(task_id)
2. exec_tool = lambda routing through backend.execute_batch
3. setup_trajectory_workspace(item, exec_tool=exec_tool) [subclass hook]
4. HermesAgentLoop(tool_handler=sandbox_tool_handler)
→ terminal calls → backend.execute_batch(slot, "bash", ...)
5. verify_and_score_trajectory(item, result, exec_tool=exec_tool) [subclass hook]
6. backend.release(slot, reset_workspace=True)
atropos/backends/modal_backend.py (ModalToolBackend)
└── acquire(trajectory_id) → Slot
└── execute_batch([(slot, "bash", {"command": "..."})]) → [ExecutionResult]
└── release(slot, reset_workspace=True)
Key Files to Modify
environments/hermes_base_env.py- Add sandbox path incollect_trajectory()environments/swe_smith_oracle_env.py- Overridesetup_trajectory_workspace()andverify_and_score_trajectory()to use exec_tool
Important Notes
exec_toolreturnsExecutionResult(fromatropos/slots/executor.py) with.success,.output,.error,.metadatatool_handlerreturns JSON string (for agent loop message format)- These are DIFFERENT interfaces for different purposes:
exec_tool: used by env hooks (setup/verify) - returns structured resulttool_handler: used by agent loop - returns JSON string like hermes tools do
- The ModalToolBackend.execute_batch calls _ModalSandboxWithSlots.execute which runs
sandbox.exec("bash", "-c", command)on Modal - For the SWE env, the worktree setup pattern from
atropos/envs/swe_smith_oracle_env.pyshould be reused (bare repo cache + worktree add)