Previously _truncate_context() mutated the shared messages list, which could drop older turns and break reward computation/debugging.
Now we keep messages as the full trajectory and apply truncation to a copy (prompt_messages) for each model call.
If tool_args_raw is not valid JSON at all (e.g. parser/provider passed
through a plain string like ls), normalize it into {command: ...} for
terminal or {input: ...} for other tools instead of dropping args.
- Revert compute_reward() tool-call shaping to simple count-based reward
(0.05 per tool call, capped at 0.3)
- Keep new agent-loop metrics available but only print them for debugging,
so environments/users can decide their own tool-call validity policy
- AgentResult now includes tool-call metrics: attempted, schema_valid,
executed_ok, exec_error
- HermesAgentLoop normalizes args robustly without crashing, but
distinguishes schema-valid args (dict) from coerced formats
(stringified JSON, plain strings)
- SweSmithOracleEnv reward shaping now prefers schema-valid tool calls
while still giving small credit for attempted tool use
agent_loop.py:
- Add _truncate_context() with 2-phase strategy (truncate tool results,
then drop oldest middle messages while keeping assistant+tool pairs)
- Add max_context_tokens parameter
- Guard against double-encoded JSON tool arguments (model outputs
string instead of dict)
hermes_base_env.py:
- Wire max_context_tokens=max_token_length through all 3 HermesAgentLoop
construction sites
hermes_parser.py:
- Prevent double-encoding: when arguments are already a string, use as-is
instead of json.dumps() which would double-encode
swe_smith_oracle_env.py:
- Shaped reward structure for cold-start training:
0.0 (no tools) -> 0.05/call up to 0.3 -> 0.4 (install ok) -> 1.0 (tests pass)
- _build_scored_item() override: truncate tokens/masks from END to fit
max_token_len instead of discarding entire groups
All changes are in environments/ only — no effect on TUI/CLI agent loop.
New: environments/swe_smith_oracle_env.py
- Subclasses HermesAgentBaseEnv (proper tools= parameter, multi-model parsers)
- Uses ToolContext.terminal() for pytest verification
- Supports tool_pool_mode flag for sandbox backends
- Reads ATROPOS_SERVER_* env vars from .env
- No dependency on atropos/agent/ or atropos/envs/agent_env.py
- environments/gsm8k_agent_env.py: Math reasoning with Python REPL tool
- Subclasses HermesAgentBaseEnv (proper tools= parameter, not ICL)
- Uses ATROPOS_SERVER_* env vars from .env
- Hermes tool call parser, configurable per model
- Math verification via math_verify with string fallback
- Tested: process mode works, both trajectories scored 1.0
- Updated memory bank with consolidation plan:
- environments/ is the canonical env system (proper tool calling)
- atropos/backends/ kept as sandbox infrastructure
- atropos/agent/ and atropos/envs/agent_env.py marked for removal
- Updated `.gitignore` to exclude `testlogs` directory.
- Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos.
- Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks.
- Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops.
- Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring.
- Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop.
- Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.
- Added new environments for reinforcement learning, including `HermesSweEnv` for software engineering tasks and `TerminalTestEnv` for inline testing.
- Introduced `ToolContext` for unrestricted access to tools during reward computation.
- Updated `.gitignore` to exclude `wandb/` directory.
- Enhanced `README.md` with detailed architecture and usage instructions for Atropos environments.
- Added configuration files for SWE and terminal test environments to streamline setup.
- Removed unnecessary compiled Python files from `__pycache__`.