hermes-agent/memory-bank/progress.md
2026-02-10 07:23:08 +00:00

6.9 KiB

Progress

Current Sprint: Phase 2 ManagedServer + SGLang Working (Feb 10, 2026)

Phase 2 End-to-End Pipeline VERIFIED

Full pipeline working: GSM8k env → collect_trajectory → ManagedServer → VLLMServer (SGLang patched) → tokens + logprobs + masks.

Test results:

  • 212 tokens with logprobs and masks from single trajectory
  • Reward: 1.0 (correct answer)
  • ScoredDataItem has all required fields: tokens, masks, scores, advantages, ref_logprobs, messages
  • RunPod SGLang endpoint (b9zmuyn1carwya) with Llama-3-8B-Instruct

Consolidation Checklist

  • Install atropos tool_call_support branch (PR #366)
  • Create environments/gsm8k_agent_env.py using HermesAgentBaseEnv
  • Create environments/agent_loop.py with proper OpenAI-spec tool calling
  • Create environments/tool_call_parsers/ with 13 parsers
  • Create environments/patches.py for SGLang compatibility
  • Add sandbox pool support to HermesAgentBaseEnv
  • Test Phase 1 (OpenAI server type) with Nous API — WORKS
  • Test Phase 2 (ManagedServer) with RunPod SGLang — WORKS
  • Port SWE env to HermesAgentBaseEnv with multiplexed sandboxing
  • End-to-end test: Qwen 3 8B + Modal sandbox + tool calls in sandbox + pytest verification
  • Add _SlotPoolEnvironment to tools/terminal_tool.py (TERMINAL_ENV=slot_pool)
  • Remove redundant atropos/tools/ files (4 of 8)
  • Remove redundant atropos/agent/ and atropos/envs/agent_env.py (deferred)
  • Test end-to-end with Tinker trainer (blocked on billing)

End-to-End SWE + Modal Sandbox Verified (Feb 10, 2026)

  • Qwen 3 8B on RunPod SGLang (endpoint 0tx0ruuuo4f10c)
  • Phase 2 ManagedServer with hermes tool call parser
  • 5 terminal commands executed in Modal sandbox: ls, git status, git log, cat parse.py, cat tests/
  • In-sandbox verification: install deps + pytest → score 0.0 (model inspected but didn't fix)
  • Full token tracking with logprobs via /generate endpoint
  • Key finding: Llama-3-8B template drops tools= silently; Qwen 3 has full Hermes tool format

Completed Features

Phase 2 ManagedServer + SGLang (Feb 10, 2026)

  • SGLang patch in environments/patches.py monkey-patches VLLMServer
  • Handles SGLang's different request/response format vs VLLM
  • Handles RunPod's double-JSON wrapping
  • Full chain verified: ManagedServer → VLLMServer → _tokens_and_logprobs_comp (retry) → patched wrapper → /generate endpoint
  • SequenceNode tracking: tokens, logprobs, masked_tokens all populated
  • Key discovery: The AttributeError from earlier was NOT in our current code — likely from a prior code state

Phase 1 OpenAI Server Mode (Feb 9-10, 2026)

  • GSM8k env works with Nous API (OpenRouter-style endpoint)
  • Terminal tool calls properly dispatched
  • Tool call parsing handled natively by server (VLLM/SGLang /v1/chat/completions)
  • Reward computation verified (math_verify for robust LaTeX comparison)

Sandbox Pool Integration (Feb 10, 2026)

  • Config fields added to HermesAgentEnvConfig for Nomad and Modal
  • _start_sandbox_backend() / _stop_sandbox_backend() lifecycle methods
  • Optional hooks: setup_trajectory_workspace(), verify_and_score_trajectory()
  • Integrated into env_manager() and process_manager() cleanup

Tool Call Parsers (Feb 9-10, 2026)

  • 13 parsers: hermes, llama3_json, llama4_json, qwen, qwen3_coder, deepseek_v3, deepseek_v31, glm45, glm47, mistral, kimi_k2, longcat
  • Registry pattern: get_parser("hermes") returns parser instance
  • Each parser: .parse(text) → (content, tool_calls)
  • Used by ManagedServer in Phase 2 to extract structured tool_calls from raw completion

Modal Backend Integration (Feb 8, 2026)

  • ModalToolBackend with slot-based multiplexing
  • Multi-profile support (CPU, GPU, high-memory)
  • Auto-scaling sandbox pool via Modal Sandboxes

Main Branch Merge (Feb 9, 2026)

  • Merged 22,560 lines, 79 files, 5 conflicts resolved
  • New: hermes_cli/, file_operations, RL training tools, gateway, cron

Tinker RL Training Setup (Feb 9, 2026)

  • tinker 0.12.0 + tinker-atropos installed
  • GSM8k agent config created
  • Pipeline verified: Tinker API connection works, all imports pass
  • Blocked on billing (Tinker 402 error)

Singularity/Apptainer Sandbox (Feb 6, 2026)

  • Nomad raw_exec driver for HPC clusters
  • All sandbox operations tested and working

Memory Bank (Feb 5, 2026)

  • Project documentation structure initialized

What to KEEP vs REMOVE

KEEP (valuable infrastructure):

Component Location Purpose
Modal backend atropos/backends/modal_backend.py Cloud sandbox pool
Nomad backend atropos/backends/nomad_backend.py Docker/Singularity sandboxes
Slot pool atropos/slots/ Container multiplexing
Nomad client atropos/nomad/ Nomad API
Sandbox server atropos/sandbox_server.py HTTP server in containers
Dockerfile atropos/Dockerfile Container image
Agent loop environments/agent_loop.py Proper OpenAI-spec tool calling
Base env environments/hermes_base_env.py Phase 1/2 with parsers
Tool parsers environments/tool_call_parsers/ 13 model parsers
SGLang patch environments/patches.py SGLang compatibility

REMOVE (redundant with environments/):

Component Location Replaced By
ICL agent atropos/agent/atropos_agent.py environments/agent_loop.py
AgentEnv atropos/envs/agent_env.py environments/hermes_base_env.py
Tool registry atropos/tools/ model_tools.py + tools/
GSM8k ICL env tinker-atropos/.../gsm8k_agent.py environments/gsm8k_agent_env.py

Known Issues

  • Tinker billing (402 error) - user's payment didn't process
  • bwrap_available: false in Singularity containers
  • Llama-3-8B-Instruct doesn't reliably produce tool calls via Phase 2 (needs Hermes-format model)
  • Model answered GSM8k correctly but didn't actually USE the terminal tool (computed mentally)

Evolution of Decisions

Agent Architecture

  • v1 (our branch): ICL-based agent with <tool_call> XML tags in system prompt
  • v2 (Teknium's): Proper OpenAI-spec tool calling with tools= parameter
  • Decision: Adopt v2, consolidate into environments/, keep sandbox backends from v1

Environment Organization

  • Before: Two parallel systems (atropos/envs/ and environments/)
  • After: Single system in environments/, using HermesAgentBaseEnv as base class
  • Sandbox backends remain in atropos/backends/ but integrate via terminal backend config

Phase 2 SGLang Support

  • Problem: VLLMServer hardcoded for VLLM's /generate format, SGLang is different
  • Solution: Monkey-patch _tokens_and_logprobs_completion_wrapper in environments/patches.py
  • Applied: Automatically at import time via apply_patches() in hermes_base_env.py
  • Handles: SGLang format differences AND RunPod's double-JSON wrapping