mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-03 02:11:48 +00:00
6.9 KiB
6.9 KiB
Progress
Current Sprint: Phase 2 ManagedServer + SGLang Working (Feb 10, 2026)
✅ Phase 2 End-to-End Pipeline VERIFIED
Full pipeline working: GSM8k env → collect_trajectory → ManagedServer → VLLMServer (SGLang patched) → tokens + logprobs + masks.
Test results:
- 212 tokens with logprobs and masks from single trajectory
- Reward: 1.0 (correct answer)
- ScoredDataItem has all required fields: tokens, masks, scores, advantages, ref_logprobs, messages
- RunPod SGLang endpoint (b9zmuyn1carwya) with Llama-3-8B-Instruct
Consolidation Checklist
- Install atropos
tool_call_supportbranch (PR #366) - Create
environments/gsm8k_agent_env.pyusingHermesAgentBaseEnv - Create
environments/agent_loop.pywith proper OpenAI-spec tool calling - Create
environments/tool_call_parsers/with 13 parsers - Create
environments/patches.pyfor SGLang compatibility - Add sandbox pool support to
HermesAgentBaseEnv - Test Phase 1 (OpenAI server type) with Nous API — WORKS
- Test Phase 2 (ManagedServer) with RunPod SGLang — WORKS
- Port SWE env to
HermesAgentBaseEnvwith multiplexed sandboxing - End-to-end test: Qwen 3 8B + Modal sandbox + tool calls in sandbox + pytest verification
- Add
_SlotPoolEnvironmenttotools/terminal_tool.py(TERMINAL_ENV=slot_pool) - Remove redundant
atropos/tools/files (4 of 8) - Remove redundant
atropos/agent/andatropos/envs/agent_env.py(deferred) - Test end-to-end with Tinker trainer (blocked on billing)
✅ End-to-End SWE + Modal Sandbox Verified (Feb 10, 2026)
- Qwen 3 8B on RunPod SGLang (endpoint
0tx0ruuuo4f10c) - Phase 2 ManagedServer with hermes tool call parser
- 5 terminal commands executed in Modal sandbox: ls, git status, git log, cat parse.py, cat tests/
- In-sandbox verification: install deps + pytest → score 0.0 (model inspected but didn't fix)
- Full token tracking with logprobs via /generate endpoint
- Key finding: Llama-3-8B template drops tools= silently; Qwen 3 has full Hermes tool format
Completed Features
✅ Phase 2 ManagedServer + SGLang (Feb 10, 2026)
- SGLang patch in
environments/patches.pymonkey-patches VLLMServer - Handles SGLang's different request/response format vs VLLM
- Handles RunPod's double-JSON wrapping
- Full chain verified: ManagedServer → VLLMServer → _tokens_and_logprobs_comp (retry) → patched wrapper → /generate endpoint
- SequenceNode tracking: tokens, logprobs, masked_tokens all populated
- Key discovery: The AttributeError from earlier was NOT in our current code — likely from a prior code state
✅ Phase 1 OpenAI Server Mode (Feb 9-10, 2026)
- GSM8k env works with Nous API (OpenRouter-style endpoint)
- Terminal tool calls properly dispatched
- Tool call parsing handled natively by server (VLLM/SGLang /v1/chat/completions)
- Reward computation verified (math_verify for robust LaTeX comparison)
✅ Sandbox Pool Integration (Feb 10, 2026)
- Config fields added to
HermesAgentEnvConfigfor Nomad and Modal _start_sandbox_backend()/_stop_sandbox_backend()lifecycle methods- Optional hooks:
setup_trajectory_workspace(),verify_and_score_trajectory() - Integrated into
env_manager()andprocess_manager()cleanup
✅ Tool Call Parsers (Feb 9-10, 2026)
- 13 parsers: hermes, llama3_json, llama4_json, qwen, qwen3_coder, deepseek_v3, deepseek_v31, glm45, glm47, mistral, kimi_k2, longcat
- Registry pattern:
get_parser("hermes")returns parser instance - Each parser:
.parse(text) → (content, tool_calls) - Used by ManagedServer in Phase 2 to extract structured tool_calls from raw completion
✅ Modal Backend Integration (Feb 8, 2026)
ModalToolBackendwith slot-based multiplexing- Multi-profile support (CPU, GPU, high-memory)
- Auto-scaling sandbox pool via Modal Sandboxes
✅ Main Branch Merge (Feb 9, 2026)
- Merged 22,560 lines, 79 files, 5 conflicts resolved
- New: hermes_cli/, file_operations, RL training tools, gateway, cron
✅ Tinker RL Training Setup (Feb 9, 2026)
- tinker 0.12.0 + tinker-atropos installed
- GSM8k agent config created
- Pipeline verified: Tinker API connection works, all imports pass
- Blocked on billing (Tinker 402 error)
✅ Singularity/Apptainer Sandbox (Feb 6, 2026)
- Nomad raw_exec driver for HPC clusters
- All sandbox operations tested and working
✅ Memory Bank (Feb 5, 2026)
- Project documentation structure initialized
What to KEEP vs REMOVE
KEEP (valuable infrastructure):
| Component | Location | Purpose |
|---|---|---|
| Modal backend | atropos/backends/modal_backend.py |
Cloud sandbox pool |
| Nomad backend | atropos/backends/nomad_backend.py |
Docker/Singularity sandboxes |
| Slot pool | atropos/slots/ |
Container multiplexing |
| Nomad client | atropos/nomad/ |
Nomad API |
| Sandbox server | atropos/sandbox_server.py |
HTTP server in containers |
| Dockerfile | atropos/Dockerfile |
Container image |
| Agent loop | environments/agent_loop.py |
Proper OpenAI-spec tool calling |
| Base env | environments/hermes_base_env.py |
Phase 1/2 with parsers |
| Tool parsers | environments/tool_call_parsers/ |
13 model parsers |
| SGLang patch | environments/patches.py |
SGLang compatibility |
REMOVE (redundant with environments/):
| Component | Location | Replaced By |
|---|---|---|
| ICL agent | atropos/agent/atropos_agent.py |
environments/agent_loop.py |
| AgentEnv | atropos/envs/agent_env.py |
environments/hermes_base_env.py |
| Tool registry | atropos/tools/ |
model_tools.py + tools/ |
| GSM8k ICL env | tinker-atropos/.../gsm8k_agent.py |
environments/gsm8k_agent_env.py |
Known Issues
- Tinker billing (402 error) - user's payment didn't process
bwrap_available: falsein Singularity containers- Llama-3-8B-Instruct doesn't reliably produce tool calls via Phase 2 (needs Hermes-format model)
- Model answered GSM8k correctly but didn't actually USE the terminal tool (computed mentally)
Evolution of Decisions
Agent Architecture
- v1 (our branch): ICL-based agent with
<tool_call>XML tags in system prompt - v2 (Teknium's): Proper OpenAI-spec tool calling with
tools=parameter - Decision: Adopt v2, consolidate into
environments/, keep sandbox backends from v1
Environment Organization
- Before: Two parallel systems (
atropos/envs/andenvironments/) - After: Single system in
environments/, usingHermesAgentBaseEnvas base class - Sandbox backends remain in
atropos/backends/but integrate via terminal backend config
Phase 2 SGLang Support
- Problem: VLLMServer hardcoded for VLLM's /generate format, SGLang is different
- Solution: Monkey-patch
_tokens_and_logprobs_completion_wrapperinenvironments/patches.py - Applied: Automatically at import time via
apply_patches()inhermes_base_env.py - Handles: SGLang format differences AND RunPod's double-JSON wrapping