- Revert compute_reward() tool-call shaping to simple count-based reward
(0.05 per tool call, capped at 0.3)
- Keep new agent-loop metrics available but only print them for debugging,
so environments/users can decide their own tool-call validity policy
- AgentResult now includes tool-call metrics: attempted, schema_valid,
executed_ok, exec_error
- HermesAgentLoop normalizes args robustly without crashing, but
distinguishes schema-valid args (dict) from coerced formats
(stringified JSON, plain strings)
- SweSmithOracleEnv reward shaping now prefers schema-valid tool calls
while still giving small credit for attempted tool use
agent_loop.py:
- Add _truncate_context() with 2-phase strategy (truncate tool results,
then drop oldest middle messages while keeping assistant+tool pairs)
- Add max_context_tokens parameter
- Guard against double-encoded JSON tool arguments (model outputs
string instead of dict)
hermes_base_env.py:
- Wire max_context_tokens=max_token_length through all 3 HermesAgentLoop
construction sites
hermes_parser.py:
- Prevent double-encoding: when arguments are already a string, use as-is
instead of json.dumps() which would double-encode
swe_smith_oracle_env.py:
- Shaped reward structure for cold-start training:
0.0 (no tools) -> 0.05/call up to 0.3 -> 0.4 (install ok) -> 1.0 (tests pass)
- _build_scored_item() override: truncate tokens/masks from END to fit
max_token_len instead of discarding entire groups
All changes are in environments/ only — no effect on TUI/CLI agent loop.
New: environments/swe_smith_oracle_env.py
- Subclasses HermesAgentBaseEnv (proper tools= parameter, multi-model parsers)
- Uses ToolContext.terminal() for pytest verification
- Supports tool_pool_mode flag for sandbox backends
- Reads ATROPOS_SERVER_* env vars from .env
- No dependency on atropos/agent/ or atropos/envs/agent_env.py