Commit graph

8 commits

Author SHA1 Message Date
Shannon Sands
06e9422324 Keep full trajectory; truncate prompt on per-turn copy
Previously _truncate_context() mutated the shared messages list, which could drop older turns and break reward computation/debugging.

Now we keep messages as the full trajectory and apply truncation to a copy (prompt_messages) for each model call.
2026-02-14 09:33:36 +10:00
Shannon Sands
907616a692 Context truncation: guard protect_tail for short histories 2026-02-14 09:28:42 +10:00
Shannon Sands
33a00d9b8e Agent loop: be robust to non-JSON tool args strings
If tool_args_raw is not valid JSON at all (e.g. parser/provider passed
through a plain string like ls), normalize it into {command: ...} for
terminal or {input: ...} for other tools instead of dropping args.
2026-02-14 09:28:23 +10:00
Shannon Sands
499490d06a Track tool-call validity vs attempts; shape reward accordingly
- AgentResult now includes tool-call metrics: attempted, schema_valid,
  executed_ok, exec_error
- HermesAgentLoop normalizes args robustly without crashing, but
  distinguishes schema-valid args (dict) from coerced formats
  (stringified JSON, plain strings)
- SweSmithOracleEnv reward shaping now prefers schema-valid tool calls
  while still giving small credit for attempted tool use
2026-02-14 09:17:05 +10:00
maxpaperclips
35b2250b36 Fix RL training pipeline: context truncation, double-encoding, shaped rewards
agent_loop.py:
- Add _truncate_context() with 2-phase strategy (truncate tool results,
  then drop oldest middle messages while keeping assistant+tool pairs)
- Add max_context_tokens parameter
- Guard against double-encoded JSON tool arguments (model outputs
  string instead of dict)

hermes_base_env.py:
- Wire max_context_tokens=max_token_length through all 3 HermesAgentLoop
  construction sites

hermes_parser.py:
- Prevent double-encoding: when arguments are already a string, use as-is
  instead of json.dumps() which would double-encode

swe_smith_oracle_env.py:
- Shaped reward structure for cold-start training:
  0.0 (no tools) -> 0.05/call up to 0.3 -> 0.4 (install ok) -> 1.0 (tests pass)
- _build_scored_item() override: truncate tokens/masks from END to fit
  max_token_len instead of discarding entire groups

All changes are in environments/ only — no effect on TUI/CLI agent loop.
2026-02-13 22:21:32 +00:00
Shannon Sands
a69924631c updated hermes_base_env, moved in sandbox logic from old agent, added patch so sglang on runpod works with /generate format (will remove). worked, model didnt produce tool calls but full logprobs worked 2026-02-10 06:06:21 +00:00
teknium
d999d9876d Enhance async tool execution and error handling in Hermes agent for Atropos integration
- Updated `.gitignore` to exclude `testlogs` directory.
- Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos.
- Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks.
- Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops.
- Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring.
- Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop.
- Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.
2026-02-08 05:00:47 +00:00
teknium
07b615e96e Add support for Atropos Agentic RL environments (requires branch tool_call_support in Atropos atm)
- Added new environments for reinforcement learning, including `HermesSweEnv` for software engineering tasks and `TerminalTestEnv` for inline testing.
- Introduced `ToolContext` for unrestricted access to tools during reward computation.
- Updated `.gitignore` to exclude `wandb/` directory.
- Enhanced `README.md` with detailed architecture and usage instructions for Atropos environments.
- Added configuration files for SWE and terminal test environments to streamline setup.
- Removed unnecessary compiled Python files from `__pycache__`.
2026-02-07 09:17:16 +00:00