Commit graph

199 commits

Author SHA1 Message Date
Shannon Sands
33a00d9b8e Agent loop: be robust to non-JSON tool args strings
If tool_args_raw is not valid JSON at all (e.g. parser/provider passed
through a plain string like ls), normalize it into {command: ...} for
terminal or {input: ...} for other tools instead of dropping args.
2026-02-14 09:28:23 +10:00
Shannon Sands
a2312076da SWE env: keep reward shaping env-defined; log tool-call metrics only
- Revert compute_reward() tool-call shaping to simple count-based reward
  (0.05 per tool call, capped at 0.3)
- Keep new agent-loop metrics available but only print them for debugging,
  so environments/users can decide their own tool-call validity policy
2026-02-14 09:19:22 +10:00
Shannon Sands
499490d06a Track tool-call validity vs attempts; shape reward accordingly
- AgentResult now includes tool-call metrics: attempted, schema_valid,
  executed_ok, exec_error
- HermesAgentLoop normalizes args robustly without crashing, but
  distinguishes schema-valid args (dict) from coerced formats
  (stringified JSON, plain strings)
- SweSmithOracleEnv reward shaping now prefers schema-valid tool calls
  while still giving small credit for attempted tool use
2026-02-14 09:17:05 +10:00
maxpaperclips
35b2250b36 Fix RL training pipeline: context truncation, double-encoding, shaped rewards
agent_loop.py:
- Add _truncate_context() with 2-phase strategy (truncate tool results,
  then drop oldest middle messages while keeping assistant+tool pairs)
- Add max_context_tokens parameter
- Guard against double-encoded JSON tool arguments (model outputs
  string instead of dict)

hermes_base_env.py:
- Wire max_context_tokens=max_token_length through all 3 HermesAgentLoop
  construction sites

hermes_parser.py:
- Prevent double-encoding: when arguments are already a string, use as-is
  instead of json.dumps() which would double-encode

swe_smith_oracle_env.py:
- Shaped reward structure for cold-start training:
  0.0 (no tools) -> 0.05/call up to 0.3 -> 0.4 (install ok) -> 1.0 (tests pass)
- _build_scored_item() override: truncate tokens/masks from END to fit
  max_token_len instead of discarding entire groups

All changes are in environments/ only — no effect on TUI/CLI agent loop.
2026-02-13 22:21:32 +00:00
maxpaperclips
395392e5de testing training 2026-02-11 22:13:05 +00:00
Shannon Sands
2041b354a9 threaded batch runner variant to share slot pool 2026-02-10 09:44:22 +00:00
Shannon Sands
3951eab399 fixed bug in check terminal requirements for slot pool 2026-02-10 09:22:22 +00:00
Shannon Sands
62001e3bf5 refactor on SlotPoolEnvironment 2026-02-10 08:30:37 +00:00
Shannon Sands
c8b30e9efa Updated terminal_tool with SlotPoolEnvironment 2026-02-10 07:23:08 +00:00
Shannon Sands
f82c3081f2 working with qwen 8b 2026-02-10 06:38:19 +00:00
Shannon Sands
a69924631c updated hermes_base_env, moved in sandbox logic from old agent, added patch so sglang on runpod works with /generate format (will remove). worked, model didnt produce tool calls but full logprobs worked 2026-02-10 06:06:21 +00:00
Shannon Sands
4619d1c8ef Port SWE-smith-oracle env to HermesAgentBaseEnv
New: environments/swe_smith_oracle_env.py
- Subclasses HermesAgentBaseEnv (proper tools= parameter, multi-model parsers)
- Uses ToolContext.terminal() for pytest verification
- Supports tool_pool_mode flag for sandbox backends
- Reads ATROPOS_SERVER_* env vars from .env
- No dependency on atropos/agent/ or atropos/envs/agent_env.py
2026-02-10 02:45:04 +00:00
Shannon Sands
98d945f6de Add sandbox pool support to HermesAgentBaseEnv
Added directly to HermesAgentBaseEnv (no subclass needed):

Config fields:
- tool_pool_mode: 'default' (terminal tool), 'nomad', or 'modal'
- Full Nomad settings: nomad_address, sandbox_job_id, slots_per_container, etc.
- Full Modal settings: modal_image, modal_gpu, modal_slots_per_sandbox, etc.
- Shared: allow_network, require_sandbox, purge_job_on_start/shutdown

Methods:
- _start_sandbox_backend() / _stop_sandbox_backend() - lifecycle
- setup_trajectory_workspace() - optional hook for workspace prep
- verify_and_score_trajectory() - optional hook for in-sandbox verification
- env_manager() / process_manager() - lifecycle cleanup

When tool_pool_mode='default': everything works as before (terminal tool)
When tool_pool_mode='nomad'/'modal': activates sandbox pool from atropos/backends/
2026-02-10 02:26:31 +00:00
Shannon Sands
507b77c4ac Point atropos dep at tool_call_support branch (PR #366)
ManagedServer in this branch passes tools= to apply_chat_template(),
enabling proper tool calling for Phase 2 (RL training with logprobs).
2026-02-10 01:54:03 +00:00
Shannon Sands
b99c2a2644 consolidating with HermesBaseEnv 2026-02-10 01:49:48 +00:00
Shannon Sands
975c849308 Add GSM8k agent env using proper HermesAgentBaseEnv (not ICL)
- environments/gsm8k_agent_env.py: Math reasoning with Python REPL tool
  - Subclasses HermesAgentBaseEnv (proper tools= parameter, not ICL)
  - Uses ATROPOS_SERVER_* env vars from .env
  - Hermes tool call parser, configurable per model
  - Math verification via math_verify with string fallback
  - Tested: process mode works, both trajectories scored 1.0

- Updated memory bank with consolidation plan:
  - environments/ is the canonical env system (proper tool calling)
  - atropos/backends/ kept as sandbox infrastructure
  - atropos/agent/ and atropos/envs/agent_env.py marked for removal
2026-02-10 01:45:07 +00:00
Shannon Sands
9dc27880cd adding tinker but need api key 2026-02-09 02:37:39 +00:00
Shannon Sands
3b9c53e6db Add Tinker RL training integration and documentation
- pyproject.toml: Added tinker SDK, torch, wandb, math-verify to [atropos] extras
- README.md: Added comprehensive RL Training with Tinker section including:
  - Architecture diagram (3-process pipeline)
  - Quick start guide for GSM8k agent training
  - Configuration documentation
  - RL CLI usage
  - Sandbox backend options (Nomad, Singularity, Modal)

New files in tinker-atropos submodule (committed there):
- tinker_atropos/environments/gsm8k_agent.py: Agent GSM8k env with Python REPL tool
- configs/gsm8k_agent.yaml: Config for Qwen3-4B training
2026-02-09 01:36:20 +00:00
Shannon Sands
05dd31131f merged main 2026-02-09 00:17:07 +00:00
Shannon Sands
36ea883d45 Merge origin/main into atropos-integrations
Merged main's latest changes including:
- New hermes_cli/ unified CLI commands
- File operations tools, fuzzy match, patch parser
- RL training tools and tinker-atropos submodule
- Enhanced batch_runner and run_agent
- Gateway improvements (Telegram, Discord)
- Cron job management
- Installation scripts

Preserved our branch-specific features:
- Modal backend (atropos/backends/modal_backend.py)
- Modal terminal tool integration (ModalProfile, _ModalSandboxPool, etc.)
- Singularity/Apptainer support
- Atropos AgentEnv Modal config fields
- Combined pyproject.toml extras (atropos + messaging + cron + cli)

Conflict resolution:
- cli.py, model_tools.py, README.md: accepted main (newer features)
- pyproject.toml: combined both extras and package lists
- tools/terminal_tool.py: accepted main's base + re-inserted Modal integration
2026-02-09 00:08:25 +00:00
Shannon Sands
6be8cdeeca modal backend working ok, merged in modal-integrations 2026-02-08 23:48:01 +00:00
teknium1
192ce958c3 Enhance CLI command handling and introduce resource cleanup features
- Added imports for resource cleanup during safe shutdown, including terminal and browser session cleanup.
- Refactored command handling to preserve original case for model names and prompt text, improving user experience.
- Introduced a dedicated interrupt queue to manage user input while the agent is running, preventing race conditions.
- Updated comments and documentation for clarity on command processing and input handling.
2026-02-08 13:31:45 -08:00
teknium1
c441681dc2 Update default model to 'anthropic/claude-opus-4.6' and refine terminal working directory settings
- Changed the default LLM model in the setup wizard and example environment file to 'anthropic/claude-opus-4.6'.
- Updated terminal working directory settings in CLI and related files to use the current directory ('.') instead of '/tmp'.
- Enhanced documentation comments for clarity on terminal configuration and working directory behavior.
2026-02-08 12:56:40 -08:00
teknium
dd70d57b9b Refactor BatchRunner and AIAgent for enhanced reasoning and tool management, improved tool definitions for fileops
- Updated `ALL_POSSIBLE_TOOLS` to auto-derive from `TOOL_TO_TOOLSET_MAP` for consistent schema.
- Introduced `_extract_reasoning_stats` function to track reasoning coverage in assistant turns.
- Enhanced `_process_batch_worker` to discard prompts with no reasoning and aggregate reasoning statistics.
- Updated documentation and comments for clarity on new features and changes.
2026-02-08 20:19:14 +00:00
teknium
f12ea1bc02 Enhance BatchRunner and AIAgent with new configuration options, default model now opus 4.6, default summarizer gemini flash 3
- Added `max_tokens`, `reasoning_config`, and `prefill_messages` parameters to `BatchRunner` and `AIAgent` for improved model response control.
- Updated CLI to support new options for reasoning effort and prefill messages from a JSON file.
- Modified example configuration files to reflect changes in default model and summary model.
- Improved error handling for loading prefill messages and reasoning configurations in the CLI.
- Updated documentation to include new parameters and usage examples.
2026-02-08 10:49:24 +00:00
Teknium
fa76a331b0
Merge pull request #19 from NousResearch/atropos-hermes-agent
Enhance async tool execution and error handling in Hermes agent for A…
2026-02-07 21:01:06 -08:00
teknium
d999d9876d Enhance async tool execution and error handling in Hermes agent for Atropos integration
- Updated `.gitignore` to exclude `testlogs` directory.
- Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos.
- Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks.
- Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops.
- Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring.
- Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop.
- Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.
2026-02-08 05:00:47 +00:00
Teknium
578a5fb6a9
Merge pull request #18 from NousResearch/atropos-hermes-agent
Upgrade installers to use uv
2026-02-07 16:00:05 -08:00
teknium
a8809bbd3e Transition installation to uv for py version and speed to be easier to streamline
- Integrated `uv` as a fast Python package manager for automatic Python provisioning and dependency management.
- Updated installation scripts (`setup-hermes.sh`, `install.sh`, `install.ps1`) to utilize `uv` for installing Python and packages, streamlining the setup process.
- Revised `README.md` to reflect changes in installation steps, including symlinking `hermes` for global access and clarifying Python version requirements.
- Adjusted commands in `doctor.py` and other scripts to recommend `uv` for package installations, ensuring consistency across the project.
2026-02-07 23:54:53 +00:00
teknium
a478e44585 Increase max_token_length in TerminalTestEnv to 16000 for enhanced processing capacity 2026-02-07 21:11:07 +00:00
teknium
c0494b3558 Update pyproject.toml to refine dependency management
- Reorganized the 'all' dependencies to include specific optional groups for better modularity.
- Added support for 'hermes-agent' with distinct categories: modal, messaging, cron, cli, and dev.
2026-02-07 21:11:01 +00:00
Teknium
7f1cd014f2
Merge pull request #17 from NousResearch/atropos-hermes-agent
Add support for Atropos Agentic RL environments (requires branch tool…
2026-02-07 09:12:10 -08:00
teknium
07b615e96e Add support for Atropos Agentic RL environments (requires branch tool_call_support in Atropos atm)
- Added new environments for reinforcement learning, including `HermesSweEnv` for software engineering tasks and `TerminalTestEnv` for inline testing.
- Introduced `ToolContext` for unrestricted access to tools during reward computation.
- Updated `.gitignore` to exclude `wandb/` directory.
- Enhanced `README.md` with detailed architecture and usage instructions for Atropos environments.
- Added configuration files for SWE and terminal test environments to streamline setup.
- Removed unnecessary compiled Python files from `__pycache__`.
2026-02-07 09:17:16 +00:00
Teknium
ab387a6120
Merge pull request #16 from NousResearch/atropos-hermes-agent
Update dependencies and enhance installation scripts
2026-02-06 16:05:50 -08:00
teknium
ac79725923 Update dependencies and enhance installation scripts
- Added `prompt_toolkit` as a direct dependency for interactive CLI support.
- Updated `modal` optional dependency to require `swe-rex[modal]>=1.4.0` for improved cloud execution capabilities.
- Enhanced `messaging` optional dependencies to include `aiohttp>=3.9.0` for WhatsApp bridge communication.
- Refined installation scripts to check for Python version requirements, emphasizing the need for Python 3.11+ for RL training tools.
- Improved setup scripts to ensure proper installation of submodules and dependencies, enhancing user experience during setup.
2026-02-07 00:05:04 +00:00
Jai Suphavadeeprasit
0bc914b00c readme edit 2026-02-06 04:24:39 -05:00
Jai Suphavadeeprasit
411e7f8ff4 readme edit 2026-02-06 04:24:12 -05:00
Jai Suphavadeeprasit
eb2e6b73fe integration 2026-02-06 04:15:56 -05:00
Shannon Sands
664acf7426 fixed gitignore 2026-02-06 02:27:47 +00:00
Shannon Sands
fd1c3da305 singularity working 2026-02-06 01:03:59 +00:00
Teknium
8dd38318fc
Merge pull request #15 from NousResearch/rl-capabilities
Rl capabilities && File Operator Tools
2026-02-05 03:50:42 -08:00
teknium1
533c064269 Add file manipulation tools and enhance setup scripts
- Introduced file manipulation capabilities in `model_tools.py`, including functions for reading, writing, patching, and searching files.
- Added a new `file` toolset in `toolsets.py` and updated distributions to include file tools.
- Enhanced `setup-hermes.sh` and `install.sh` scripts to check for and optionally install `ripgrep` for faster file searching.
- Implemented a new `file_operations.py` module to encapsulate file operations using shell commands.
- Updated `doctor.py` and `install.ps1` to check for `ripgrep` and provide installation guidance if not found.
- Added fuzzy matching and patch parsing capabilities to improve file manipulation accuracy and flexibility.
2026-02-05 03:49:46 -08:00
Shannon Sands
4d619bcd21 moved nomand config 2026-02-05 15:45:46 +10:00
teknium1
5c3105b437 Enhance RL test inference with WandB integration and real-time output streaming
- Added unique run ID generation for WandB tracking during test inference.
- Enabled WandB usage for test tracking and updated command-line arguments accordingly.
- Implemented real-time output streaming for process execution, improving log visibility and debugging.
- Enhanced error handling to display last few lines of stderr for better troubleshooting.
2026-02-04 21:07:07 -08:00
Shannon Sands
beac2ee06a increasing per-chat timeout (re api issues ergh), and tweaked logging 2026-02-05 14:54:34 +10:00
Shannon Sands
487487406d adjusted prompt again to make things more reliable, having api issues 2026-02-05 14:42:10 +10:00
Shannon Sands
87464821d8 added metadata capture 2026-02-05 12:00:31 +10:00
Shannon Sands
661d8f4d6c logprobs 2026-02-05 11:42:58 +10:00
Shannon Sands
bf13a848ef endpoint issue (can reproduce with curl calls) 2026-02-05 11:27:18 +10:00
Shannon Sands
88286f6da3 slow completions over group_size 4, debugging added 2026-02-05 10:57:13 +10:00