Commit graph

8 commits

Author SHA1 Message Date
maxpaperclips
35b2250b36 Fix RL training pipeline: context truncation, double-encoding, shaped rewards
agent_loop.py:
- Add _truncate_context() with 2-phase strategy (truncate tool results,
  then drop oldest middle messages while keeping assistant+tool pairs)
- Add max_context_tokens parameter
- Guard against double-encoded JSON tool arguments (model outputs
  string instead of dict)

hermes_base_env.py:
- Wire max_context_tokens=max_token_length through all 3 HermesAgentLoop
  construction sites

hermes_parser.py:
- Prevent double-encoding: when arguments are already a string, use as-is
  instead of json.dumps() which would double-encode

swe_smith_oracle_env.py:
- Shaped reward structure for cold-start training:
  0.0 (no tools) -> 0.05/call up to 0.3 -> 0.4 (install ok) -> 1.0 (tests pass)
- _build_scored_item() override: truncate tokens/masks from END to fit
  max_token_len instead of discarding entire groups

All changes are in environments/ only — no effect on TUI/CLI agent loop.
2026-02-13 22:21:32 +00:00
maxpaperclips
395392e5de testing training 2026-02-11 22:13:05 +00:00
Shannon Sands
3951eab399 fixed bug in check terminal requirements for slot pool 2026-02-10 09:22:22 +00:00
Shannon Sands
62001e3bf5 refactor on SlotPoolEnvironment 2026-02-10 08:30:37 +00:00
Shannon Sands
a69924631c updated hermes_base_env, moved in sandbox logic from old agent, added patch so sglang on runpod works with /generate format (will remove). worked, model didnt produce tool calls but full logprobs worked 2026-02-10 06:06:21 +00:00
Shannon Sands
98d945f6de Add sandbox pool support to HermesAgentBaseEnv
Added directly to HermesAgentBaseEnv (no subclass needed):

Config fields:
- tool_pool_mode: 'default' (terminal tool), 'nomad', or 'modal'
- Full Nomad settings: nomad_address, sandbox_job_id, slots_per_container, etc.
- Full Modal settings: modal_image, modal_gpu, modal_slots_per_sandbox, etc.
- Shared: allow_network, require_sandbox, purge_job_on_start/shutdown

Methods:
- _start_sandbox_backend() / _stop_sandbox_backend() - lifecycle
- setup_trajectory_workspace() - optional hook for workspace prep
- verify_and_score_trajectory() - optional hook for in-sandbox verification
- env_manager() / process_manager() - lifecycle cleanup

When tool_pool_mode='default': everything works as before (terminal tool)
When tool_pool_mode='nomad'/'modal': activates sandbox pool from atropos/backends/
2026-02-10 02:26:31 +00:00
teknium
d999d9876d Enhance async tool execution and error handling in Hermes agent for Atropos integration
- Updated `.gitignore` to exclude `testlogs` directory.
- Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos.
- Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks.
- Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops.
- Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring.
- Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop.
- Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.
2026-02-08 05:00:47 +00:00
teknium
07b615e96e Add support for Atropos Agentic RL environments (requires branch tool_call_support in Atropos atm)
- Added new environments for reinforcement learning, including `HermesSweEnv` for software engineering tasks and `TerminalTestEnv` for inline testing.
- Introduced `ToolContext` for unrestricted access to tools during reward computation.
- Updated `.gitignore` to exclude `wandb/` directory.
- Enhanced `README.md` with detailed architecture and usage instructions for Atropos environments.
- Added configuration files for SWE and terminal test environments to streamline setup.
- Removed unnecessary compiled Python files from `__pycache__`.
2026-02-07 09:17:16 +00:00