hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-02 02:01:47 +00:00

Author	SHA1	Message	Date
Shannon Sands	24c13bc412	Hermes parser: clarify string arguments comment (JSON vs plain)	2026-02-14 09:35:55 +10:00
Shannon Sands	06e9422324	Keep full trajectory; truncate prompt on per-turn copy Previously _truncate_context() mutated the shared messages list, which could drop older turns and break reward computation/debugging. Now we keep messages as the full trajectory and apply truncation to a copy (prompt_messages) for each model call.	2026-02-14 09:33:36 +10:00
Shannon Sands	907616a692	Context truncation: guard protect_tail for short histories	2026-02-14 09:28:42 +10:00
Shannon Sands	33a00d9b8e	Agent loop: be robust to non-JSON tool args strings If tool_args_raw is not valid JSON at all (e.g. parser/provider passed through a plain string like ls), normalize it into {command: ...} for terminal or {input: ...} for other tools instead of dropping args.	2026-02-14 09:28:23 +10:00
Shannon Sands	a2312076da	SWE env: keep reward shaping env-defined; log tool-call metrics only - Revert compute_reward() tool-call shaping to simple count-based reward (0.05 per tool call, capped at 0.3) - Keep new agent-loop metrics available but only print them for debugging, so environments/users can decide their own tool-call validity policy	2026-02-14 09:19:22 +10:00
Shannon Sands	499490d06a	Track tool-call validity vs attempts; shape reward accordingly - AgentResult now includes tool-call metrics: attempted, schema_valid, executed_ok, exec_error - HermesAgentLoop normalizes args robustly without crashing, but distinguishes schema-valid args (dict) from coerced formats (stringified JSON, plain strings) - SweSmithOracleEnv reward shaping now prefers schema-valid tool calls while still giving small credit for attempted tool use	2026-02-14 09:17:05 +10:00
maxpaperclips	35b2250b36	Fix RL training pipeline: context truncation, double-encoding, shaped rewards agent_loop.py: - Add _truncate_context() with 2-phase strategy (truncate tool results, then drop oldest middle messages while keeping assistant+tool pairs) - Add max_context_tokens parameter - Guard against double-encoded JSON tool arguments (model outputs string instead of dict) hermes_base_env.py: - Wire max_context_tokens=max_token_length through all 3 HermesAgentLoop construction sites hermes_parser.py: - Prevent double-encoding: when arguments are already a string, use as-is instead of json.dumps() which would double-encode swe_smith_oracle_env.py: - Shaped reward structure for cold-start training: 0.0 (no tools) -> 0.05/call up to 0.3 -> 0.4 (install ok) -> 1.0 (tests pass) - _build_scored_item() override: truncate tokens/masks from END to fit max_token_len instead of discarding entire groups All changes are in environments/ only — no effect on TUI/CLI agent loop.	2026-02-13 22:21:32 +00:00
maxpaperclips	395392e5de	testing training	2026-02-11 22:13:05 +00:00
Shannon Sands	3951eab399	fixed bug in check terminal requirements for slot pool	2026-02-10 09:22:22 +00:00
Shannon Sands	62001e3bf5	refactor on SlotPoolEnvironment	2026-02-10 08:30:37 +00:00
Shannon Sands	a69924631c	updated hermes_base_env, moved in sandbox logic from old agent, added patch so sglang on runpod works with /generate format (will remove). worked, model didnt produce tool calls but full logprobs worked	2026-02-10 06:06:21 +00:00
Shannon Sands	4619d1c8ef	Port SWE-smith-oracle env to HermesAgentBaseEnv New: environments/swe_smith_oracle_env.py - Subclasses HermesAgentBaseEnv (proper tools= parameter, multi-model parsers) - Uses ToolContext.terminal() for pytest verification - Supports tool_pool_mode flag for sandbox backends - Reads ATROPOS_SERVER_* env vars from .env - No dependency on atropos/agent/ or atropos/envs/agent_env.py	2026-02-10 02:45:04 +00:00
Shannon Sands	98d945f6de	Add sandbox pool support to HermesAgentBaseEnv Added directly to HermesAgentBaseEnv (no subclass needed): Config fields: - tool_pool_mode: 'default' (terminal tool), 'nomad', or 'modal' - Full Nomad settings: nomad_address, sandbox_job_id, slots_per_container, etc. - Full Modal settings: modal_image, modal_gpu, modal_slots_per_sandbox, etc. - Shared: allow_network, require_sandbox, purge_job_on_start/shutdown Methods: - _start_sandbox_backend() / _stop_sandbox_backend() - lifecycle - setup_trajectory_workspace() - optional hook for workspace prep - verify_and_score_trajectory() - optional hook for in-sandbox verification - env_manager() / process_manager() - lifecycle cleanup When tool_pool_mode='default': everything works as before (terminal tool) When tool_pool_mode='nomad'/'modal': activates sandbox pool from atropos/backends/	2026-02-10 02:26:31 +00:00
Shannon Sands	975c849308	Add GSM8k agent env using proper HermesAgentBaseEnv (not ICL) - environments/gsm8k_agent_env.py: Math reasoning with Python REPL tool - Subclasses HermesAgentBaseEnv (proper tools= parameter, not ICL) - Uses ATROPOS_SERVER_* env vars from .env - Hermes tool call parser, configurable per model - Math verification via math_verify with string fallback - Tested: process mode works, both trajectories scored 1.0 - Updated memory bank with consolidation plan: - environments/ is the canonical env system (proper tool calling) - atropos/backends/ kept as sandbox infrastructure - atropos/agent/ and atropos/envs/agent_env.py marked for removal	2026-02-10 01:45:07 +00:00
teknium	d999d9876d	Enhance async tool execution and error handling in Hermes agent for Atropos integration - Updated `.gitignore` to exclude `testlogs` directory. - Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos. - Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks. - Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops. - Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring. - Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop. - Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.	2026-02-08 05:00:47 +00:00
teknium	a478e44585	Increase max_token_length in TerminalTestEnv to 16000 for enhanced processing capacity	2026-02-07 21:11:07 +00:00
teknium	07b615e96e	Add support for Atropos Agentic RL environments (requires branch tool_call_support in Atropos atm) - Added new environments for reinforcement learning, including `HermesSweEnv` for software engineering tasks and `TerminalTestEnv` for inline testing. - Introduced `ToolContext` for unrestricted access to tools during reward computation. - Updated `.gitignore` to exclude `wandb/` directory. - Enhanced `README.md` with detailed architecture and usage instructions for Atropos environments. - Added configuration files for SWE and terminal test environments to streamline setup. - Removed unnecessary compiled Python files from `__pycache__`.	2026-02-07 09:17:16 +00:00

17 commits