diff --git a/.clinerules b/.clinerules new file mode 100644 index 0000000000..e6be63d5e2 --- /dev/null +++ b/.clinerules @@ -0,0 +1,115 @@ +# Cline's Memory Bank + +I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This isn't a limitation - it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my Memory Bank to understand the project and continue work effectively. I MUST read ALL memory bank files at the start of EVERY task - this is not optional. + +## Memory Bank Structure + +The Memory Bank consists of core files and optional context files, all in Markdown format. Files build upon each other in a clear hierarchy: + +flowchart TD + PB[projectbrief.md] --> PC[productContext.md] + PB --> SP[systemPatterns.md] + PB --> TC[techContext.md] + + PC --> AC[activeContext.md] + SP --> AC + TC --> AC + + AC --> P[progress.md] + +### Core Files (Required) +1. `projectbrief.md` + - Foundation document that shapes all other files + - Created at project start if it doesn't exist + - Defines core requirements and goals + - Source of truth for project scope + +2. `productContext.md` + - Why this project exists + - Problems it solves + - How it should work + - User experience goals + +3. `activeContext.md` + - Current work focus + - Recent changes + - Next steps + - Active decisions and considerations + - Important patterns and preferences + - Learnings and project insights + +4. `systemPatterns.md` + - System architecture + - Key technical decisions + - Design patterns in use + - Component relationships + - Critical implementation paths + +5. `techContext.md` + - Technologies used + - Development setup + - Technical constraints + - Dependencies + - Tool usage patterns + +6. `progress.md` + - What works + - What's left to build + - Current status + - Known issues + - Evolution of project decisions + +### Additional Context +Create additional files/folders within memory-bank/ when they help organize: +- Complex feature documentation +- Integration specifications +- API documentation +- Testing strategies +- Deployment procedures + +## Core Workflows + +### Plan Mode +flowchart TD + Start[Start] --> ReadFiles[Read Memory Bank] + ReadFiles --> CheckFiles{Files Complete?} + + CheckFiles -->|No| Plan[Create Plan] + Plan --> Document[Document in Chat] + + CheckFiles -->|Yes| Verify[Verify Context] + Verify --> Strategy[Develop Strategy] + Strategy --> Present[Present Approach] + +### Act Mode +flowchart TD + Start[Start] --> Context[Check Memory Bank] + Context --> Update[Update Documentation] + Update --> Execute[Execute Task] + Execute --> Document[Document Changes] + +## Documentation Updates + +Memory Bank updates occur when: +1. Discovering new project patterns +2. After implementing significant changes +3. When user requests with **update memory bank** (MUST review ALL files) +4. When context needs clarification + +flowchart TD + Start[Update Process] + + subgraph Process + P1[Review ALL Files] + P2[Document Current State] + P3[Clarify Next Steps] + P4[Document Insights & Patterns] + + P1 --> P2 --> P3 --> P4 + end + + Start --> Process + +Note: When triggered by **update memory bank**, I MUST review every memory bank file, even if some don't require updates. Focus particularly on activeContext.md and progress.md as they track current state. + +REMEMBER: After every memory reset, I begin completely fresh. The Memory Bank is my only link to previous work. It must be maintained with precision and clarity, as my effectiveness depends entirely on its accuracy. \ No newline at end of file diff --git a/.env.example b/.env.example index 5b70dd8d35..77d13e8fd9 100644 --- a/.env.example +++ b/.env.example @@ -9,6 +9,20 @@ # - atropos : Atroposlib ServerManager/ManagedServer-backed loop (training/env integration) HERMES_BACKEND=openai + +# ============================================================================= +# LOCAL / SELF-HOSTED OPENAI-COMPATIBLE ENDPOINTS (vLLM, SGLang, llama.cpp, etc.) +# ============================================================================= +# For local development (matches the Atropos test env defaults): +# ATROPOS_SERVER_BASE_URL=http://127.0.0.1:8080 +# ATROPOS_SERVER_MODEL=hermes-4-36b +# For hosted inference (Nous Research inference API): +ATROPOS_SERVER_BASE_URL= +ATROPOS_SERVER_MODEL= +ATROPOS_TOKENIZER_NAME= +# Set this to your Nous API key (Bearer token). +ATROPOS_SERVER_API_KEY= + # Debugging (prints to stdout; use with care) # HERMES_DEBUG_ATROPOS_REQUEST=1 # HERMES_DEBUG_ATROPOS_RESPONSE=1 diff --git a/.gitignore b/.gitignore index 062ef4da79..770e017bb9 100644 --- a/.gitignore +++ b/.gitignore @@ -40,7 +40,14 @@ agent-browser/ privvy* images/ -# CLI config (may contain sensitive SSH paths) -cli-config.yaml - -.DS_Store +# CLI config (may contain sensitive SSH paths) +cli-config.yaml + +.DS_Store + +# artifacts +*.jsonl +*.html +*.json +*.log +*.csv \ No newline at end of file diff --git a/README.md b/README.md index 001728a33e..9ac830130e 100644 --- a/README.md +++ b/README.md @@ -610,3 +610,248 @@ All environment variables can be configured in the `.env` file (copy from `.env. | `skills/` | On-demand knowledge documents | | `docs/` | Documentation | | `configs/` | Example batch run scripts | + +# Atropos Integrations & RL Training + +Atropos is an RL training framework that uses Hermes-Agent for agent-based environments. This section covers setting up the sandbox infrastructure with either Docker or Singularity backends. + +## Prerequisites + +### 1. Install Nomad +Nomad is a workload orchestrator that manages the sandbox containers: + +```bash +# Install Nomad (Linux) +curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg +echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list +sudo apt update && sudo apt install nomad + +# Verify installation +nomad --version +``` + +For other platforms, see: https://developer.hashicorp.com/nomad/docs/install + +### 2. Install Atropos Dependencies + +```bash +python3 -m venv .venv +source .venv/bin/activate +pip install -e '.[atropos]' +``` + +## Backend Options + +Atropos supports two container backends for the sandbox environment: + +| Backend | Use Case | Requirements | +|---------|----------|--------------| +| **Docker** | Development, servers with Docker | Docker installed, user in `docker` group | +| **Singularity** | HPC clusters, rootless environments | Apptainer/Singularity installed (no root needed) | + +--- + +## Docker Backend (Default) + +### 1. Build the Sandbox Image + +```bash +cd atropos +docker build -t atropos-sandbox:local . +``` + +### 2. Start Nomad (Development Mode) + +```bash +# Start Nomad with Docker driver +nomad agent -dev -config=nomad-dev.hcl +``` + +Or create `nomad-dev.hcl`: +```hcl +client { + enabled = true + options { + "driver.allowlist" = "docker" + } +} +``` + +### 3. Run with Docker Backend + +```bash +source .venv/bin/activate + +# Test the environment +python -m atropos.envs.swe_smith_oracle_env process \ + --env.use_wandb false \ + --env.total_steps 1 \ + --env.max_items 1 \ + --env.driver docker +``` + +--- + +## Singularity Backend (HPC/Rootless) + +Singularity/Apptainer is ideal for HPC clusters where Docker requires root privileges. + +### 1. Build the Singularity Image + +```bash +cd atropos + +# Option A: Convert from Docker image (if Docker is available) +docker build -t atropos-sandbox:local . +apptainer build atropos-sandbox.sif docker-daemon://atropos-sandbox:local + +# Option B: Build directly from Dockerfile (requires root or fakeroot) +apptainer build atropos-sandbox.sif docker://ghcr.io/nousresearch/atropos-sandbox:latest +``` + +### 2. Start Nomad with raw_exec Driver + +Singularity uses Nomad's `raw_exec` driver. Create `nomad-singularity.hcl`: + +```hcl +client { + enabled = true + options { + "driver.allowlist" = "raw_exec,docker" + } +} + +plugin "raw_exec" { + config { + enabled = true + } +} +``` + +Start Nomad: +```bash +nomad agent -dev -config=nomad-singularity.hcl +``` + +### 3. Run with Singularity Backend + +```bash +source .venv/bin/activate + +# Basic test +python -m atropos.envs.swe_smith_oracle_env process \ + --env.use_wandb false \ + --env.total_steps 1 \ + --env.max_items 1 \ + --env.driver singularity \ + --env.singularity_image /path/to/atropos-sandbox.sif + +# Full example with all options +python -m atropos.envs.swe_smith_oracle_env process \ + --env.use_wandb false \ + --env.total_steps 10 \ + --env.group_size 4 \ + --env.max_items 100 \ + --env.driver singularity \ + --env.singularity_image /path/to/atropos-sandbox.sif \ + --env.slots_per_container 10 \ + --env.min_containers 1 \ + --env.max_containers 5 +``` + +--- + +## CLI Arguments Reference + +### Environment Configuration (`--env.*`) + +| Argument | Default | Description | +|----------|---------|-------------| +| `--env.driver` | `docker` | Container backend: `docker` or `singularity` | +| `--env.singularity_image` | - | Path to `.sif` file (required for singularity driver) | +| `--env.sandbox_image` | `atropos-sandbox:local` | Docker image name (for docker driver) | +| `--env.slots_per_container` | `10` | Number of parallel slots per container | +| `--env.min_containers` | `1` | Minimum number of containers to run | +| `--env.max_containers` | `10` | Maximum containers for auto-scaling | +| `--env.nomad_address` | `http://localhost:4646` | Nomad server address | +| `--env.privileged` | `false` | Run containers in privileged mode (Docker only) | + +### Processing Configuration + +| Argument | Default | Description | +|----------|---------|-------------| +| `--env.total_steps` | `1` | Number of processing steps | +| `--env.group_size` | `1` | Items per processing group | +| `--env.max_items` | `0` | Max dataset items (0 = all) | +| `--env.use_wandb` | `true` | Enable Weights & Biases logging | +| `--env.agent_max_steps` | `50` | Max agent steps per trajectory | + +--- + +## Troubleshooting + +### Port Already in Use +```bash +# Find and kill process on port 8080 +lsof -ti :8080 | xargs kill + +# Or use a different port +--env.port 8081 +``` + +### Singularity: Permission Denied +```bash +# Check Apptainer is installed +apptainer --version + +# Ensure the .sif file is readable +ls -la /path/to/atropos-sandbox.sif +``` + +### Nomad: Job Not Starting +```bash +# Check Nomad status +nomad status + +# View job logs +nomad alloc logs -job atropos-sandbox-agent-env + +# Check stderr for errors +nomad alloc logs -stderr -job atropos-sandbox-agent-env +``` + +### OpenAI API Token Error +If you see `NotImplementedError: OpenAI endpoints do not support token IDs`: +```bash +# For testing/evaluation only (not training) +export ATROPOS_ALLOW_DUMMY_MANAGED_SERVER=1 +``` + +--- + +## Example: Full HPC Workflow + +```bash +# 1. Setup environment +python3 -m venv .venv +source .venv/bin/activate +pip install -e '.[atropos]' + +# 2. Build Singularity image (on a machine with Docker) +cd atropos +docker build -t atropos-sandbox:local . +apptainer build atropos-sandbox.sif docker-daemon://atropos-sandbox:local + +# 3. Transfer .sif to HPC cluster +scp atropos-sandbox.sif user@hpc-cluster:/scratch/user/ + +# 4. On HPC cluster: Start Nomad +nomad agent -dev -config=nomad-singularity.hcl & + +# 5. Run training +python -m atropos.envs.swe_smith_oracle_env process \ + --env.driver singularity \ + --env.singularity_image /scratch/user/atropos-sandbox.sif \ + --env.total_steps 100 \ + --env.max_items 1000 +``` diff --git a/atropos/atropos-sandbox.sif b/atropos/atropos-sandbox.sif new file mode 100755 index 0000000000..adf433d994 Binary files /dev/null and b/atropos/atropos-sandbox.sif differ diff --git a/atropos/backends/nomad_backend.py b/atropos/backends/nomad_backend.py index 8bfc0df8ea..06e7465964 100644 --- a/atropos/backends/nomad_backend.py +++ b/atropos/backends/nomad_backend.py @@ -26,6 +26,10 @@ class NomadBackendConfig: privileged: bool acquire_timeout_s: float purge_job_on_start: bool + # Driver selection: "docker" or "singularity" + driver: str = "docker" + # Path to .sif file for singularity driver (required if driver="singularity") + singularity_image: Optional[str] = None @classmethod def from_agent_env_config(cls, cfg: Any) -> "NomadBackendConfig": @@ -39,6 +43,8 @@ class NomadBackendConfig: privileged=bool(getattr(cfg, "privileged")), acquire_timeout_s=float(getattr(cfg, "acquire_timeout_s")), purge_job_on_start=bool(getattr(cfg, "purge_job_on_start", False)), + driver=str(getattr(cfg, "driver", "docker")), + singularity_image=getattr(cfg, "singularity_image", None), ) @@ -56,6 +62,8 @@ class NomadToolBackend(ToolBackend): privileged=config.privileged, acquire_timeout=config.acquire_timeout_s, purge_job_on_start=bool(config.purge_job_on_start), + driver=config.driver, + singularity_image=config.singularity_image, ) ) diff --git a/atropos/envs/agent_env.py b/atropos/envs/agent_env.py index 7d7d14b916..02b32ac1bc 100644 --- a/atropos/envs/agent_env.py +++ b/atropos/envs/agent_env.py @@ -60,6 +60,16 @@ class AgentEnvConfig(BaseEnvConfig): ), ) purge_job_on_shutdown: bool = Field(default=True, description="Nomad mode: stop/purge job on shutdown") + + # Nomad driver selection (docker or singularity) + driver: str = Field( + default="docker", + description="Nomad task driver: 'docker' (default) or 'singularity' (for HPC without sudo Docker)", + ) + singularity_image: Optional[str] = Field( + default=None, + description="Path to .sif file for Singularity driver (required if driver='singularity')", + ) # modal mode settings (stub; implementation pending) modal_app_name: str = Field(default="atropos-sandbox", description="Modal app name (stub)") diff --git a/atropos/nomad/client.py b/atropos/nomad/client.py index 7ff81e5183..147905e01a 100644 --- a/atropos/nomad/client.py +++ b/atropos/nomad/client.py @@ -241,9 +241,10 @@ class NomadClient: if networks: network = networks[0] address = network.get("IP") - # Look for dynamic ports + # Look for dynamic ports OR reserved ports (Singularity/raw_exec uses reserved) dyn_ports = network.get("DynamicPorts") or [] - for dp in dyn_ports: + reserved_ports = network.get("ReservedPorts") or [] + for dp in dyn_ports + reserved_ports: if dp.get("Label") == "http": port = dp.get("Value") break @@ -353,16 +354,18 @@ def create_sandbox_job( memory: int = 512, port: int = 8080, datacenter: str = "dc1", + driver: str = "docker", # "docker" or "singularity" + singularity_image: str = None, # Path to .sif file for singularity driver ) -> Dict[str, Any]: """ Create a sandbox job specification. - This job runs the sandbox_server.py inside a Python container, + This job runs the sandbox_server.py inside a container, with the specified number of slots for agent workspaces. Args: job_id: Unique job identifier - image: Docker image to use + image: Docker image to use (for docker driver) count: Number of container instances slots_per_container: Number of slots per container privileged: Run container in privileged mode (recommended for bubblewrap) @@ -370,10 +373,81 @@ def create_sandbox_job( memory: Memory allocation in MB port: HTTP port for sandbox server datacenter: Nomad datacenter + driver: Container driver - "docker" or "singularity" + singularity_image: Path to .sif file (required if driver="singularity") Returns: Job specification dict """ + # Build task config based on driver + if driver == "singularity": + if not singularity_image: + raise ValueError("singularity_image path required when driver='singularity'") + + # Use raw_exec driver to run apptainer via shell for variable expansion + # The container binds the allocation directory for workspace persistence + # For raw_exec, we use static port since Nomad's dynamic port mapping doesn't + # work the same as Docker - the process runs directly on the host. + shell_cmd = ( + f'apptainer run ' + f'--bind "$NOMAD_ALLOC_DIR/data:/data" ' + f'--pwd /app ' + f'--env PYTHONUNBUFFERED=1 ' + f'{singularity_image} ' + f'python sandbox_server.py ' + f'--port {port} ' + f'--slots {slots_per_container} ' + f'--data-dir /data' + ) + task_config = { + "command": "/bin/sh", + "args": ["-c", shell_cmd], + } + task_driver = "raw_exec" + else: + # Docker driver (default) + task_config = { + "image": image, + "force_pull": False, # Use local image, don't try to pull + "ports": ["http"], + "privileged": privileged, + "command": "python", + "args": [ + "sandbox_server.py", + "--port", str(port), + "--slots", str(slots_per_container), + "--data-dir", "/data", + ], + # Note: On Linux, you can mount persistent storage: + # "volumes": ["${NOMAD_ALLOC_DIR}/data:/data"], + # On macOS/Docker Desktop, skip volumes for PoC + # (container /data is ephemeral but works for testing) + } + task_driver = "docker" + + # For Singularity/raw_exec, use static ports since the process runs directly on host. + # For Docker, use dynamic ports with port mapping. + if driver == "singularity": + network_config = { + "Mode": "host", + "ReservedPorts": [ + { + "Label": "http", + "Value": port, + } + ], + } + else: + network_config = { + "Mode": "host", + "DynamicPorts": [ + { + "Label": "http", + "To": port, + } + ], + } + return { "ID": job_id, "Name": job_id, @@ -390,38 +464,12 @@ def create_sandbox_job( "HealthCheck": "task_states", "MinHealthyTime": 0, }, - "Networks": [ - { - "Mode": "host", - "DynamicPorts": [ - { - "Label": "http", - "To": port, - } - ], - } - ], + "Networks": [network_config], "Tasks": [ { "Name": "sandbox-server", - "Driver": "docker", - "Config": { - "image": image, - "force_pull": False, # Use local image, don't try to pull - "ports": ["http"], - "privileged": privileged, - "command": "python", - "args": [ - "sandbox_server.py", - "--port", str(port), - "--slots", str(slots_per_container), - "--data-dir", "/data", - ], - # Note: On Linux, you can mount persistent storage: - # "volumes": ["${NOMAD_ALLOC_DIR}/data:/data"], - # On macOS/Docker Desktop, skip volumes for PoC - # (container /data is ephemeral but works for testing) - }, + "Driver": task_driver, + "Config": task_config, "Env": { "PYTHONUNBUFFERED": "1", "NOMAD_ALLOC_DIR": "${NOMAD_ALLOC_DIR}", diff --git a/atropos/slots/pool.py b/atropos/slots/pool.py index eee19116e1..03c8147fe2 100644 --- a/atropos/slots/pool.py +++ b/atropos/slots/pool.py @@ -44,6 +44,11 @@ class SlotPoolConfig: cpu: int = 500 # MHz memory: int = 512 # MB + # Driver selection: "docker" or "singularity" + driver: str = "docker" + # Path to .sif file for singularity driver (required if driver="singularity") + singularity_image: Optional[str] = None + # Scaling settings min_containers: int = 1 max_containers: int = 10 @@ -238,7 +243,7 @@ class SlotPool: if job is None: # Deploy new job - logger.info(f"Deploying sandbox job: {self.config.job_id}") + logger.info(f"Deploying sandbox job: {self.config.job_id} (driver={self.config.driver})") job_spec = create_sandbox_job( job_id=self.config.job_id, image=self.config.image, @@ -248,6 +253,8 @@ class SlotPool: cpu=self.config.cpu, memory=self.config.memory, datacenter=self.config.datacenter, + driver=self.config.driver, + singularity_image=self.config.singularity_image, ) result = await self.nomad.submit_job(job_spec) if "error" in result: diff --git a/hermes_agent.egg-info/PKG-INFO b/hermes_agent.egg-info/PKG-INFO index d17e541623..d532413be3 100644 --- a/hermes_agent.egg-info/PKG-INFO +++ b/hermes_agent.egg-info/PKG-INFO @@ -647,3 +647,13 @@ All environment variables can be configured in the `.env` file (copy from `.env. | `skills/` | On-demand knowledge documents | | `docs/` | Documentation | | `configs/` | Example batch run scripts | + +# Atropos Integrations & RL Training + +## Nomad Setup +Follow this: https://developer.hashicorp.com/nomad/docs/deploy + +## Atropos dependencies +python3 -m venv .venv +source .venv/bin/activate +pip install -e '.[atropos]' diff --git a/hermes_agent.egg-info/SOURCES.txt b/hermes_agent.egg-info/SOURCES.txt index 87e0cd7d93..7abce632ff 100644 --- a/hermes_agent.egg-info/SOURCES.txt +++ b/hermes_agent.egg-info/SOURCES.txt @@ -15,8 +15,13 @@ atropos/agent/atropos_agent.py atropos/api/__init__.py atropos/api/tool_executor_server.py atropos/api/tool_server.py +atropos/backends/__init__.py +atropos/backends/base.py +atropos/backends/modal_backend.py +atropos/backends/nomad_backend.py atropos/envs/__init__.py atropos/envs/agent_env.py +atropos/envs/hermes_compat_test_env.py atropos/envs/sandbox_terminal_smoke_env.py atropos/envs/swe_smith_oracle_env.py atropos/envs/test_env.py @@ -50,6 +55,7 @@ tests/test_modal_terminal.py tests/test_nous_api_limits.py tests/test_nous_api_pattern.py tests/test_temperature_fix.py +tests/test_tool_call_parsing.py tests/test_web_tools.py tools/__init__.py tools/browser_tool.py diff --git a/memory-bank/activeContext.md b/memory-bank/activeContext.md new file mode 100644 index 0000000000..b7c0621d4b --- /dev/null +++ b/memory-bank/activeContext.md @@ -0,0 +1,62 @@ +# Active Context + +## Current Focus +Singularity/Apptainer integration for HPC environments has been **COMPLETED AND TESTED**. + +## Recently Completed (Feb 6, 2026) + +### Singularity/Apptainer Sandbox Integration - FULLY WORKING +Successfully adapted the Atropos implementation from Docker to Singularity/Apptainer for HPC clusters where Docker cannot run without sudo permissions. + +**Files Modified:** +1. `atropos/nomad/client.py` - Added `driver` and `singularity_image` parameters to `create_sandbox_job()`; Fixed port detection to check both `DynamicPorts` and `ReservedPorts` in `get_job_allocations()` +2. `atropos/slots/pool.py` - Added `driver` and `singularity_image` to `SlotPoolConfig` +3. `atropos/backends/nomad_backend.py` - Added driver options to `NomadBackendConfig` +4. `atropos/envs/agent_env.py` - Added CLI arguments `--env.driver` and `--env.singularity_image` to `AgentEnvConfig` + +**Files Created:** +1. `nomad-singularity.hcl` - Nomad config with raw_exec driver enabled +2. `atropos/atropos-sandbox.sif` - Singularity image (80MB) built from Docker image +3. `test_singularity_job.py` - Test script for Singularity integration + +**Key Implementation Details:** +- Uses Nomad's `raw_exec` driver to run `apptainer` commands +- Shell wrapper (`/bin/sh -c`) ensures Nomad environment variables expand correctly +- Binds Nomad allocation directory to `/data` for workspace persistence +- Uses **static ports** (`ReservedPorts`) instead of dynamic ports since raw_exec runs directly on host +- `get_job_allocations()` now checks both `DynamicPorts` (Docker) and `ReservedPorts` (Singularity) + +**Test Results (All Passing):** +- Health check: ✅ Server responding with 5 slots +- Bash execution: ✅ Commands execute inside Singularity container +- Write file: ✅ File written to slot workspace +- Read file: ✅ File read back successfully + +## Usage + +### For Docker (default): +```python +config = SlotPoolConfig( + driver="docker", + image="atropos-sandbox:local", +) +``` + +### For Singularity/Apptainer: +```python +config = SlotPoolConfig( + driver="singularity", + singularity_image="/path/to/atropos-sandbox.sif", +) +``` + +### Nomad Configuration: +```bash +# Start Nomad with Singularity support +nomad agent -dev -config=nomad-singularity.hcl +``` + +## Next Steps +- Deploy to HPC cluster for production testing +- Consider adding bubblewrap (bwrap) support inside Singularity for additional sandboxing +- Document HPC-specific deployment procedures in skills/mlops/ diff --git a/memory-bank/productContext.md b/memory-bank/productContext.md new file mode 100644 index 0000000000..8b6981cfcf --- /dev/null +++ b/memory-bank/productContext.md @@ -0,0 +1,55 @@ +# Product Context: Hermes-Agent + +## Why This Project Exists + +Hermes-Agent addresses several key challenges in the AI agent space: + +1. **Unified Tool Interface** - Provides a clean, consistent interface for LLMs to use various tools (web, terminal, browser, vision, etc.) without requiring custom integration for each model provider. + +2. **Training Data Generation** - Enables efficient generation of high-quality tool-calling trajectories for fine-tuning LLMs, with features like batch processing, checkpointing, and trajectory compression. + +3. **Flexible Deployment** - Supports multiple execution environments (local, Docker, Singularity, Modal, SSH) to accommodate different security and isolation requirements. + +4. **Developer Experience** - Offers a beautiful, interactive CLI with kawaii-style feedback that makes working with AI agents enjoyable. + +## Problems It Solves + +### For AI Researchers +- **Data Generation at Scale**: Parallel batch processing with content-based checkpointing for fault tolerance +- **Clean Trajectories**: Trajectory compression to fit token budgets while preserving important information +- **Toolset Distributions**: Probability-based tool selection for varied training data + +### For Developers +- **Tool Orchestration**: Logical grouping of tools into toolsets (research, development, debugging, etc.) +- **Session Persistence**: Conversation history and session logging for debugging +- **Multi-Model Support**: Works with any OpenAI-compatible API (OpenRouter, local models, etc.) + +### For MLOps +- **Skills System**: On-demand knowledge documents for specific tools/frameworks (Axolotl, vLLM, TRL, etc.) +- **Sandboxed Execution**: Terminal commands can run in isolated environments (Docker, Singularity, Modal) +- **Configurable Backends**: Easy switching between local and cloud execution + +## How It Should Work + +### User Flow (CLI) +1. User launches `./hermes` +2. Beautiful welcome banner displays with caduceus logo, model info, and available tools +3. User types a natural language request +4. Agent processes request, potentially calling tools with animated feedback +5. Agent responds with results, conversation continues +6. Session is automatically logged for debugging + +### User Flow (Batch Processing) +1. User prepares JSONL file with prompts +2. Runs `batch_runner.py` with distribution and worker count +3. System processes prompts in parallel, saves checkpoints +4. Completed trajectories saved to `data//trajectories.jsonl` +5. Optional: compress trajectories with `trajectory_compressor.py` + +## User Experience Goals + +- **Delightful Interaction**: Kawaii ASCII faces, animated spinners, cute messages +- **Informative Feedback**: Clear progress indication during tool execution +- **Configurable Personalities**: From "helpful" to "pirate" to "Shakespeare" +- **Easy Configuration**: YAML config file + environment variables + CLI flags +- **Graceful Degradation**: Missing tools/APIs don't break the system, just disable features diff --git a/memory-bank/progress.md b/memory-bank/progress.md new file mode 100644 index 0000000000..17ff55dc92 --- /dev/null +++ b/memory-bank/progress.md @@ -0,0 +1,67 @@ +# Progress + +## Completed Features + +### ✅ Singularity/Apptainer Sandbox Integration (Feb 6, 2026 - FULLY TESTED) +Adapted the Atropos sandbox environment from Docker to Singularity/Apptainer for HPC clusters. + +**What Works:** +- `create_sandbox_job()` supports both `driver="docker"` and `driver="singularity"` +- SlotPoolConfig and NomadBackendConfig propagate driver settings +- Singularity container runs sandbox_server.py via Nomad's raw_exec driver +- All sandbox operations work: bash execution, file read/write +- Nomad environment variables properly expanded via shell wrapper +- **CLI arguments** `--env.driver` and `--env.singularity_image` for AgentEnvConfig +- **Static port binding** for Singularity (ReservedPorts vs DynamicPorts) +- **Port detection** works for both Docker and Singularity allocations + +**CLI Usage:** +```bash +python -m atropos.envs.swe_smith_oracle_env process \ + --env.driver singularity \ + --env.singularity_image /path/to/atropos-sandbox.sif +``` + +**Created Files:** +- `nomad-singularity.hcl` - Nomad config with raw_exec enabled +- `atropos/atropos-sandbox.sif` - 80MB Singularity image +- `test_singularity_job.py` - Integration test script + +**Modified Files:** +- `atropos/nomad/client.py` - driver support + ReservedPorts detection +- `atropos/slots/pool.py` - driver config fields +- `atropos/backends/nomad_backend.py` - driver config fields +- `atropos/envs/agent_env.py` - CLI arguments for driver selection + +### ✅ Memory Bank Initialized (Feb 5, 2026) +Set up project documentation structure for context persistence. + +## In Progress +None currently. + +## Known Issues +- `bwrap_available: false` in Singularity containers - bubblewrap sandboxing not available inside the container (kernel namespaces already in use) +- Health check timing - may need longer wait for container startup on slower systems + +## What's Left to Build + +### HPC Deployment +- [ ] Test on actual HPC cluster with Slurm/PBS integration +- [ ] Document cluster-specific deployment procedures +- [ ] Add support for shared filesystem workspace binding + +### Enhanced Sandboxing +- [ ] Investigate alternative sandboxing inside Singularity (seccomp, etc.) +- [ ] Add network isolation options for Singularity + +### Documentation +- [ ] Add Singularity deployment to README +- [ ] Create HPC deployment skill in skills/mlops/ + +## Evolution of Decisions + +### Container Runtime Selection +- **Initial**: Docker-only via Nomad docker driver +- **Problem**: HPC clusters don't allow Docker without sudo +- **Solution**: Added Singularity/Apptainer support via raw_exec driver +- **Result**: Both runtimes now supported with same API diff --git a/memory-bank/projectbrief.md b/memory-bank/projectbrief.md new file mode 100644 index 0000000000..9d14db92d1 --- /dev/null +++ b/memory-bank/projectbrief.md @@ -0,0 +1,44 @@ +# Project Brief: Hermes-Agent + +## Overview +Hermes-Agent is an AI agent harness for LLMs with advanced tool-calling capabilities, featuring a flexible toolsets system for organizing and managing tools. Named after Hermes, the Greek messenger god, it serves as a bridge between human intent and AI-powered task execution. + +## Core Requirements + +### Primary Goals +1. **Interactive CLI Experience** - Beautiful terminal interface with animated feedback, personalities, and session management +2. **Flexible Tool System** - Modular tools organized into logical toolsets for different use cases +3. **Batch Processing** - Process multiple prompts in parallel with checkpointing and statistics +4. **Multi-Backend Support** - Support for local, Docker, Singularity, Modal, and SSH terminal backends +5. **Training Data Generation** - Save conversation trajectories in formats suitable for LLM fine-tuning + +### Target Users +- AI researchers generating training data +- Developers needing an AI assistant with tool access +- MLOps practitioners automating workflows +- Anyone needing a powerful CLI-based AI agent + +## Scope + +### In Scope +- Interactive CLI with rich formatting and kawaii-style feedback +- Web tools (search, extract, crawl via Firecrawl) +- Terminal tools (command execution across multiple backends) +- Browser automation (via agent-browser + Browserbase) +- Vision tools (image analysis) +- Image generation (FLUX via FAL.ai) +- Mixture-of-Agents reasoning +- Skills system for on-demand knowledge +- Batch processing with parallel workers +- Trajectory compression for training + +### Out of Scope (Current) +- Proactive suggestions (agent only runs on request) +- Clipboard integration (no local system access) +- Real-time streaming of thinking/reasoning (deferred) + +## Success Metrics +- Clean, maintainable tool architecture +- Reliable tool execution with proper error handling +- Efficient context management for long conversations +- High-quality trajectory data for training diff --git a/memory-bank/systemPatterns.md b/memory-bank/systemPatterns.md new file mode 100644 index 0000000000..ba49c9435c --- /dev/null +++ b/memory-bank/systemPatterns.md @@ -0,0 +1,149 @@ +# System Patterns: Hermes-Agent + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ CLI (cli.py) │ +│ - Rich welcome banner with caduceus │ +│ - prompt_toolkit for input with history │ +│ - Kawaii-style feedback and personalities │ +└────────────────────────────┬────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ AIAgent (run_agent.py) │ +│ - Conversation loop with tool calling │ +│ - KawaiiSpinner for animated feedback │ +│ - Retry logic with exponential backoff │ +│ - Session logging to logs/ directory │ +└────────────────────────────┬────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Tool Routing (model_tools.py) │ +│ - get_tool_definitions() - returns tools for API calls │ +│ - handle_function_call() - dispatches to tool handlers │ +│ - Toolset filtering (enabled/disabled) │ +└────────────────────────────┬────────────────────────────────────┘ + │ + ┌─────────────────┼─────────────────┐ + ▼ ▼ ▼ + ┌───────────┐ ┌───────────┐ ┌───────────┐ + │ Web Tools │ │ Terminal │ │ Browser │ + │ (Firecrawl)│ │ (mini-swe)│ │(agent-brw)│ + └───────────┘ └───────────┘ └───────────┘ + │ │ │ + └─────────────────┼─────────────────┘ + ▼ + ┌───────────────┐ + │ Toolsets │ + │ (toolsets.py)│ + │ Composition │ + └───────────────┘ +``` + +## Key Design Patterns + +### 1. Toolset Composition Pattern +Toolsets can include other toolsets, allowing flexible composition: + +```python +TOOLSETS = { + "web": {"tools": ["web_search", "web_extract"], "includes": []}, + "debugging": {"tools": ["terminal"], "includes": ["web"]}, + "full_stack": {"tools": [], "includes": ["web", "terminal", "vision", "browser"]} +} +``` + +Resolution is recursive with cycle detection. + +### 2. Graceful Degradation Pattern +Each tool module has a `check_*_requirements()` function: +- Tools are only loaded if requirements are met +- Missing API keys disable tools, not crash the system +- Import errors are caught and tools marked unavailable + +```python +try: + from tools.web_tools import web_search_tool, check_firecrawl_api_key +except ModuleNotFoundError: + web_search_tool = None + def check_firecrawl_api_key(): return False +``` + +### 3. Session Isolation Pattern (task_id) +Stateful tools (terminal, browser) use `task_id` to isolate concurrent sessions: +- Each batch worker gets unique task_id +- VMs and browser sessions are tracked per task_id +- Cleanup functions release resources: `cleanup_vm(task_id)`, `cleanup_browser(task_id)` + +### 4. Trajectory Format Pattern +Conversations are saved in ShareGPT format for training: + +```json +{"from": "system", "value": "System prompt with ..."} +{"from": "human", "value": "User message"} +{"from": "gpt", "value": "reasoning\n{...}"} +{"from": "tool", "value": "{...}"} +{"from": "gpt", "value": "Final response"} +``` + +### 5. Ephemeral System Prompt Pattern +Guide model behavior during data collection without saving to trajectories: +- `ephemeral_system_prompt` influences execution +- Only standard tool-calling system prompt saved to trajectories +- Keeps training data clean + +### 6. Retry with Validation Pattern +The agent validates responses before accepting: +- Check tool names against `valid_tool_names` set +- Validate JSON arguments can be parsed +- Check for content after `` blocks +- Roll back to last valid state on persistent failures + +## Component Relationships + +### AIAgent Class +- Central orchestrator for conversations +- Manages conversation history +- Calls OpenAI-compatible API +- Routes tool calls to handlers +- Provides animated feedback (KawaiiSpinner) + +### Tool Modules (tools/*.py) +- Self-contained tool implementations +- Export: handler function + check function + schema +- Return JSON strings (never raw dicts) +- Accept optional `task_id` for stateful tools + +### Toolsets System (toolsets.py) +- Defines logical groupings of tools +- Supports composition via `includes` +- `resolve_toolset()` recursively resolves all tools +- `validate_toolset()` checks if name is valid + +### Model Tools (model_tools.py) +- Aggregates all tool definitions +- Routes function calls to correct handlers +- Filters tools based on enabled/disabled toolsets +- Bridge between agent and tool implementations + +## Critical Implementation Paths + +### Tool Execution Flow +1. AIAgent receives tool_calls from API response +2. Validates tool names against `valid_tool_names` +3. Validates JSON arguments can be parsed +4. Calls `handle_function_call()` with tool name, args, task_id +5. `handle_function_call()` routes to appropriate handler +6. Tool executes, returns JSON string +7. Result added to conversation as tool message +8. Loop continues until natural language response + +### Configuration Loading Flow +1. `cli.py` calls `load_cli_config()` +2. Loads `cli-config.yaml`, merges with defaults +3. Sets environment variables for terminal config +4. `AIAgent` reads env vars when initializing terminal tool +5. Terminal tool creates appropriate backend based on `TERMINAL_ENV` diff --git a/memory-bank/techContext.md b/memory-bank/techContext.md new file mode 100644 index 0000000000..e8b0022c02 --- /dev/null +++ b/memory-bank/techContext.md @@ -0,0 +1,113 @@ +# Technical Context: Hermes-Agent + +## Technologies Used + +### Core Stack +- **Python 3.11+** - Primary language +- **OpenAI SDK** - For LLM API interactions (OpenAI-compatible) +- **OpenRouter** - Default LLM provider (supports multiple models) +- **Rich** - Terminal formatting and panels +- **prompt_toolkit** - Interactive input with history +- **Fire** - CLI argument parsing +- **PyYAML** - Configuration files +- **python-dotenv** - Environment variable management + +### Tool Dependencies +- **Firecrawl** - Web search and extraction (`FIRECRAWL_API_KEY`) +- **mini-swe-agent** - Terminal tool backend (local/docker/singularity/modal/ssh) +- **agent-browser** - Browser automation (npm package) +- **Browserbase** - Cloud browser execution (`BROWSERBASE_API_KEY`) +- **FAL.ai** - Image generation with FLUX (`FAL_KEY`) +- **Nous API** - Vision and MoA tools (`NOUS_API_KEY`) + +### Optional Dependencies +- **Modal** - Cloud compute for sandboxed environments +- **Singularity/Apptainer** - Rootless containers (HPC environments) +- **Docker** - Container isolation + +## Development Setup + +### Quick Start +```bash +# Clone with submodules +git clone --recurse-submodules https://github.com/NousResearch/Hermes-Agent.git +cd Hermes-Agent + +# Create virtual environment +python3 -m venv venv +source venv/bin/activate + +# Install dependencies +pip install -r requirements.txt +pip install -e ./mini-swe-agent + +# Install browser tools (optional) +npm install + +# Configure environment +cp .env.example .env +# Edit .env with your API keys +``` + +### Key Configuration Files +- `.env` - API keys and secrets +- `cli-config.yaml` - CLI configuration (model, terminal, toolsets, personalities) +- `configs/` - Batch run scripts and configuration + +### Environment Variables + +**Required for Full Functionality:** +- `OPENROUTER_API_KEY` - Primary LLM access +- `FIRECRAWL_API_KEY` - Web tools +- `NOUS_API_KEY` - Vision and reasoning tools +- `FAL_KEY` - Image generation + +**Terminal Backend:** +- `TERMINAL_ENV` - Backend type: `local`, `docker`, `singularity`, `modal`, `ssh` +- `TERMINAL_CWD` - Working directory +- `TERMINAL_DOCKER_IMAGE` / `TERMINAL_SINGULARITY_IMAGE` - Container images +- `TERMINAL_SSH_HOST/USER/KEY` - SSH backend config +- `SUDO_PASSWORD` - Optional sudo support + +**Browser:** +- `BROWSERBASE_API_KEY` - Browser automation +- `BROWSERBASE_PROJECT_ID` - Browserbase project + +## Technical Constraints + +1. **Context Window Limits** - Long tool outputs can exhaust context; trajectory compression helps +2. **API Rate Limits** - OpenRouter and tool APIs have rate limits; exponential backoff implemented +3. **Tool Availability** - Tools gracefully degrade if dependencies/keys missing +4. **Async Compatibility** - Some tools are async, handled via `asyncio.run()` in sync context + +## Dependency Graph + +``` +tools/*.py → tools/__init__.py → model_tools.py → toolsets.py → toolset_distributions.py + ↑ +run_agent.py ──────────────────────────┘ +cli.py → run_agent.py (uses AIAgent with quiet_mode=True) +batch_runner.py → run_agent.py + toolset_distributions.py +``` + +## Tool Usage Patterns + +### Adding a New Tool +1. Create `tools/your_tool.py` with handler + requirements check +2. Export in `tools/__init__.py` +3. Register in `model_tools.py` (definitions + handler routing) +4. Add to toolset in `toolsets.py` +5. Optionally add to `toolset_distributions.py` for batch processing + +### Tool Handler Pattern +```python +def your_tool(param: str, task_id: str = None) -> str: + """Execute tool and return JSON string result.""" + try: + result = {"success": True, "data": "..."} + return json.dumps(result, ensure_ascii=False) + except Exception as e: + return json.dumps({"error": str(e)}, ensure_ascii=False) +``` + +All tool handlers MUST return a JSON string, never raw dicts. diff --git a/nomad-singularity.hcl b/nomad-singularity.hcl new file mode 100644 index 0000000000..572969c999 --- /dev/null +++ b/nomad-singularity.hcl @@ -0,0 +1,31 @@ +# Nomad Configuration for Singularity/Apptainer Sandbox +# Run with: nomad agent -dev -config=nomad-singularity.hcl +# +# This uses the raw_exec driver to run Apptainer containers. +# Suitable for HPC environments where Docker cannot run without sudo. + +client { + enabled = true + + options { + # Enable raw_exec driver for Singularity/Apptainer + "driver.raw_exec.enable" = "1" + } +} + +# raw_exec driver plugin configuration +plugin "raw_exec" { + config { + enabled = true + } +} + +# Optional: If you have the nomad-driver-singularity plugin installed, +# uncomment the following instead of using raw_exec: +# plugin "singularity" { +# config { +# enabled = true +# # Allow bind mounts +# bind_paths = ["/tmp", "/var/tmp"] +# } +# } diff --git a/test_singularity_job.py b/test_singularity_job.py new file mode 100644 index 0000000000..e7e36423c2 --- /dev/null +++ b/test_singularity_job.py @@ -0,0 +1,126 @@ +#!/usr/bin/env python3 +""" +Test script for Singularity sandbox job creation. + +This tests the create_sandbox_job function with driver="singularity". +""" + +import asyncio +import sys +import json +import importlib.util + +# Load atropos.nomad.client directly to bypass __init__.py +spec = importlib.util.spec_from_file_location( + "nomad_client", + "/root/Hermes-Agent/atropos/nomad/client.py" +) +nomad_client = importlib.util.module_from_spec(spec) +sys.modules["nomad_client"] = nomad_client +spec.loader.exec_module(nomad_client) + +NomadClient = nomad_client.NomadClient +create_sandbox_job = nomad_client.create_sandbox_job + + +async def test_singularity_job(): + """Test Singularity job creation and submission to Nomad.""" + + job_id = "test-singularity-sandbox" + sif_path = "/root/Hermes-Agent/atropos/atropos-sandbox.sif" + + print("=== Singularity Sandbox Job Test ===\n") + + # Create job spec for Singularity + print("Creating Singularity job spec...") + job_spec = create_sandbox_job( + job_id=job_id, + driver="singularity", + singularity_image=sif_path, + slots_per_container=5, + count=1, + cpu=500, + memory=512, + ) + + # Print task driver and config + task = job_spec["TaskGroups"][0]["Tasks"][0] + print(f" Driver: {task['Driver']}") + print(f" Config: {json.dumps(task['Config'], indent=4)}") + print() + + # Test submission to Nomad + print("Connecting to Nomad...") + client = NomadClient(address="http://localhost:4646") + + try: + # Check health + healthy = await client.is_healthy() + print(f" Nomad healthy: {healthy}") + + if not healthy: + print("❌ Nomad is not reachable!") + return False + + # Purge any existing job + print(f"\nPurging existing job '{job_id}'...") + await client.stop_job(job_id, purge=True) + + # Submit job + print(f"Submitting Singularity job '{job_id}'...") + result = await client.submit_job(job_spec) + print(f" Result: {result}") + + if "error" in result: + print(f"❌ Job submission failed: {result}") + return False + + # Wait for allocation + print("\nWaiting for allocation (10 seconds)...") + await asyncio.sleep(10) + + # Check allocations + allocs = await client.get_job_allocations(job_id) + print(f"Allocations: {len(allocs)}") + for alloc in allocs: + print(f" - {alloc.id[:8]} status={alloc.status.value} http={alloc.http_address}") + + # Get detailed info + detail = await client.get_allocation(alloc.id) + if detail: + task_states = detail.get("TaskStates", {}) + for task_name, state in task_states.items(): + events = state.get("Events", [])[-3:] + print(f" Task '{task_name}': {[e.get('Type') for e in events]}") + + # Check if any are running + running = [a for a in allocs if a.status.value == "running"] + if running: + print(f"\n✅ Job running! {len(running)} allocation(s)") + + # Try to reach the sandbox server + if running[0].http_address: + import aiohttp + try: + async with aiohttp.ClientSession() as session: + async with session.get(f"{running[0].http_address}/health", timeout=aiohttp.ClientTimeout(total=5)) as resp: + print(f" Health check: {resp.status} - {await resp.text()}") + except Exception as e: + print(f" Health check failed: {e}") + else: + print("\n⚠️ No running allocations yet (may still be starting)") + + return True + + finally: + # Don't cleanup - leave running for debugging + print(f"\n[Leaving job '{job_id}' running for debugging]") + print(f" View logs: nomad alloc logs -job {job_id}") + print(f" Cleanup: nomad job stop -purge {job_id}") + await client.close() + print("Done!") + + +if __name__ == "__main__": + success = asyncio.run(test_singularity_job()) + sys.exit(0 if success else 1) diff --git a/test_singularity_sandbox.py b/test_singularity_sandbox.py new file mode 100644 index 0000000000..f378b3459b --- /dev/null +++ b/test_singularity_sandbox.py @@ -0,0 +1,108 @@ +#!/usr/bin/env python3 +""" +Test script for Singularity/Apptainer sandbox integration. + +This tests the SlotPool with driver="singularity" using the raw_exec Nomad driver. +""" + +import asyncio +import sys +import os + +# Add parent to path for imports +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +from atropos.slots.pool import SlotPool, SlotPoolConfig + + +async def test_singularity_sandbox(): + """Test the Singularity sandbox deployment and basic execution.""" + + # Configure for Singularity + config = SlotPoolConfig( + nomad_address="http://localhost:4646", + job_id="atropos-sandbox-singularity", + driver="singularity", + singularity_image="/root/Hermes-Agent/atropos/atropos-sandbox.sif", + slots_per_container=5, + min_containers=1, + max_containers=2, + cpu=500, + memory=512, + purge_job_on_start=True, # Clean start for testing + ) + + print(f"Testing Singularity sandbox with config:") + print(f" driver: {config.driver}") + print(f" singularity_image: {config.singularity_image}") + print(f" job_id: {config.job_id}") + print() + + pool = SlotPool(config) + + try: + print("Starting SlotPool...") + await pool.start() + + stats = pool.get_stats() + print(f"Pool started! Stats: {stats}") + print() + + # Acquire a slot + print("Acquiring slot...") + slot = await pool.acquire("test-trajectory-001") + print(f"Acquired slot: {slot.slot_id} (alloc={slot.alloc_id[:8]})") + print() + + # Execute a simple command + print("Executing 'echo hello from singularity'...") + result = await pool.execute( + slot, + "bash", + {"command": "echo 'Hello from Singularity sandbox!' && uname -a"} + ) + print(f"Result: {result}") + print() + + # Test file write + print("Testing file write...") + write_result = await pool.execute( + slot, + "write_file", + {"path": "test.txt", "content": "Test file from Singularity!"} + ) + print(f"Write result: {write_result}") + + # Test file read + print("Testing file read...") + read_result = await pool.execute( + slot, + "read_file", + {"path": "test.txt"} + ) + print(f"Read result: {read_result}") + print() + + # Release slot + print("Releasing slot...") + await pool.release(slot) + + print("✅ All tests passed!") + + except Exception as e: + print(f"❌ Error: {e}") + import traceback + traceback.print_exc() + return False + + finally: + print("\nStopping pool...") + await pool.stop(purge_job=True) + print("Pool stopped.") + + return True + + +if __name__ == "__main__": + success = asyncio.run(test_singularity_sandbox()) + sys.exit(0 if success else 1) diff --git a/wandb/latest-run b/wandb/latest-run new file mode 120000 index 0000000000..d4df01c1d7 --- /dev/null +++ b/wandb/latest-run @@ -0,0 +1 @@ +run-20260206_003827-82b0oahi \ No newline at end of file diff --git a/wandb/run-20260206_003827-82b0oahi/files/config.yaml b/wandb/run-20260206_003827-82b0oahi/files/config.yaml new file mode 100644 index 0000000000..b7f3a441c3 --- /dev/null +++ b/wandb/run-20260206_003827-82b0oahi/files/config.yaml @@ -0,0 +1,180 @@ +_wandb: + value: + cli_version: 0.24.2 + e: + 2gw7xuffca69jbm2b60l3w5ymo5pb5lf: + args: + - process + - --env.driver + - singularity + - --env.singularity_image + - /root/Hermes-Agent/atropos/atropos-sandbox.sif + email: shannon@nousresearch.com + executable: /root/Hermes-Agent/.venv/bin/python + git: + commit: 4d619bcd21feedc9eed36c53c038585d97e7295e + remote: https://github.com/NousResearch/Hermes-Agent.git + host: vultr + os: Linux-6.8.0-90-generic-x86_64-with-glibc2.39 + program: -m atropos.envs.swe_smith_oracle_env + python: CPython 3.12.3 + root: /root/Hermes-Agent + startedAt: "2026-02-06T00:38:27.351013Z" + writerId: 2gw7xuffca69jbm2b60l3w5ymo5pb5lf + m: [] + python_version: 3.12.3 + t: + "1": + - 11 + - 49 + - 51 + - 95 + "3": + - 13 + - 16 + "4": 3.12.3 + "5": 0.24.2 + "6": 5.0.0 + "12": 0.24.2 + "13": linux-x86_64 +acquire_timeout_s: + value: 30 +agent_max_steps: + value: 50 +agent_max_tokens: + value: null +agent_temperature: + value: 0.7 +agent_tool_delay_s: + value: 0 +allow_network: + value: true +batch_size: + value: 1 +custom_thinking_prompt: + value: null +data_dir_to_save_evals: + value: null +data_path_to_save_groups: + value: data/swe_smith_oracle_env_2.jsonl +dataset_name: + value: NousResearch/SWE-smith-oracle +dataset_split: + value: train +disabled_toolsets: + value: [] +driver: + value: singularity +enabled_toolsets: + value: + - terminal +ensure_scores_are_not_same: + value: false +eval_handling: + value: STOP_TRAIN +eval_limit_ratio: + value: 0.5 +group_size: + value: 1 +include_messages: + value: true +inference_weight: + value: 1 +install_timeout_s: + value: 600 +max_batches_offpolicy: + value: 3 +max_containers: + value: 10 +max_eval_workers: + value: 16 +max_items: + value: 0 +max_num_workers: + value: -1 +max_num_workers_per_node: + value: 8 +max_reasoning_tokens: + value: null +max_token_length: + value: 8192 +min_batch_allocation: + value: null +min_containers: + value: 1 +min_items_sent_before_logging: + value: 2 +modal_app_name: + value: atropos-sandbox +modal_function_name: + value: sandbox_server +modal_volume_mount_path: + value: /data +modal_volume_name: + value: null +nomad_address: + value: http://localhost:4646 +num_rollouts_per_group_for_logging: + value: 1 +num_rollouts_to_keep: + value: 32 +privileged: + value: false +prompt_mode: + value: problem_statement +purge_job_on_shutdown: + value: true +purge_job_on_start: + value: true +python_only: + value: true +reasoning_effort: + value: null +repo_base_url: + value: https://github.com +require_sandbox: + value: false +require_stateful_sandbox: + value: false +rollout_server_url: + value: http://localhost:8000 +sandbox_image: + value: atropos-sandbox:local +sandbox_job_id: + value: atropos-sandbox-agent-env +score_include_fail_to_pass: + value: true +seed: + value: 0 +shuffle: + value: true +singularity_image: + value: /root/Hermes-Agent/atropos/atropos-sandbox.sif +slots_per_container: + value: 10 +steps_per_eval: + value: 1 +test_timeout_s: + value: 600 +thinking_mode: + value: false +tokenizer_name: + value: NousResearch/Hermes-4.3-36B +tool_batch_window_ms: + value: 20 +tool_max_batch_size: + value: 200 +tool_pool_mode: + value: nomad +tool_server_token: + value: null +tool_server_url: + value: null +total_steps: + value: 1 +use_wandb: + value: true +wandb_name: + value: swe_smith_oracle +worker_timeout: + value: 600 diff --git a/wandb/run-20260206_003827-82b0oahi/run-82b0oahi.wandb b/wandb/run-20260206_003827-82b0oahi/run-82b0oahi.wandb new file mode 100644 index 0000000000..4c3108a598 Binary files /dev/null and b/wandb/run-20260206_003827-82b0oahi/run-82b0oahi.wandb differ