hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-03 02:11:48 +00:00

Author	SHA1	Message	Date
Shannon Sands	a2312076da	SWE env: keep reward shaping env-defined; log tool-call metrics only - Revert compute_reward() tool-call shaping to simple count-based reward (0.05 per tool call, capped at 0.3) - Keep new agent-loop metrics available but only print them for debugging, so environments/users can decide their own tool-call validity policy	2026-02-14 09:19:22 +10:00
Shannon Sands	499490d06a	Track tool-call validity vs attempts; shape reward accordingly - AgentResult now includes tool-call metrics: attempted, schema_valid, executed_ok, exec_error - HermesAgentLoop normalizes args robustly without crashing, but distinguishes schema-valid args (dict) from coerced formats (stringified JSON, plain strings) - SweSmithOracleEnv reward shaping now prefers schema-valid tool calls while still giving small credit for attempted tool use	2026-02-14 09:17:05 +10:00
maxpaperclips	35b2250b36	Fix RL training pipeline: context truncation, double-encoding, shaped rewards agent_loop.py: - Add _truncate_context() with 2-phase strategy (truncate tool results, then drop oldest middle messages while keeping assistant+tool pairs) - Add max_context_tokens parameter - Guard against double-encoded JSON tool arguments (model outputs string instead of dict) hermes_base_env.py: - Wire max_context_tokens=max_token_length through all 3 HermesAgentLoop construction sites hermes_parser.py: - Prevent double-encoding: when arguments are already a string, use as-is instead of json.dumps() which would double-encode swe_smith_oracle_env.py: - Shaped reward structure for cold-start training: 0.0 (no tools) -> 0.05/call up to 0.3 -> 0.4 (install ok) -> 1.0 (tests pass) - _build_scored_item() override: truncate tokens/masks from END to fit max_token_len instead of discarding entire groups All changes are in environments/ only — no effect on TUI/CLI agent loop.	2026-02-13 22:21:32 +00:00
maxpaperclips	395392e5de	testing training	2026-02-11 22:13:05 +00:00
Shannon Sands	62001e3bf5	refactor on SlotPoolEnvironment	2026-02-10 08:30:37 +00:00
Shannon Sands	a69924631c	updated hermes_base_env, moved in sandbox logic from old agent, added patch so sglang on runpod works with /generate format (will remove). worked, model didnt produce tool calls but full logprobs worked	2026-02-10 06:06:21 +00:00
Shannon Sands	4619d1c8ef	Port SWE-smith-oracle env to HermesAgentBaseEnv New: environments/swe_smith_oracle_env.py - Subclasses HermesAgentBaseEnv (proper tools= parameter, multi-model parsers) - Uses ToolContext.terminal() for pytest verification - Supports tool_pool_mode flag for sandbox backends - Reads ATROPOS_SERVER_* env vars from .env - No dependency on atropos/agent/ or atropos/envs/agent_env.py	2026-02-10 02:45:04 +00:00

7 commits