mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
adding tinker but need api key
This commit is contained in:
parent
3b9c53e6db
commit
9dc27880cd
1 changed files with 46 additions and 68 deletions
|
|
@ -1,83 +1,61 @@
|
||||||
# Active Context
|
# Active Context
|
||||||
|
|
||||||
## Current Focus
|
## Current Focus
|
||||||
Modal backend integration has been **MERGED AND UPDATED** from the `modal-integration` branch.
|
Tinker RL training integration - pipeline fully wired up, waiting on Tinker billing to test.
|
||||||
|
|
||||||
## Recently Completed (Feb 8, 2026)
|
## Recently Completed (Feb 9, 2026)
|
||||||
|
|
||||||
### Modal Backend Integration - MERGED & WORKING
|
### Tinker RL Training Integration
|
||||||
Merged the `modal-integration` branch into `atropos-integrations` and fixed integration issues.
|
Created a complete agent training pipeline using Tinker (Thinking Machines) + Atropos:
|
||||||
|
|
||||||
**What was merged (from another dev's branch):**
|
**New Files Created:**
|
||||||
1. `atropos/backends/modal_backend.py` - Complete Modal backend with:
|
1. `tinker-atropos/tinker_atropos/environments/gsm8k_agent.py` - Agent GSM8k environment with:
|
||||||
- `ModalSandboxConfig` - Unified config with YAML profiles, env vars, and AgentEnv config loading
|
- Python REPL tool calling (Hermes-style `<tool_call>` format)
|
||||||
- `_ModalSandboxWithSlots` - Modal Sandbox wrapper with slot-based multiplexing
|
- Multi-step agent loop within `collect_trajectories()`
|
||||||
- `_ModalSandboxPool` - Auto-scaling pool of Modal sandboxes
|
- Math answer verification via `math_verify`
|
||||||
- `_ModalMultiProfileManager` - Multi-profile support (CPU, GPU, high-memory)
|
- Subprocess-based Python execution
|
||||||
- `ModalToolBackend` - Full ToolBackend implementation
|
- WandB metrics (percent_correct, tool_use_rate)
|
||||||
2. `atropos/backends/__init__.py` - Updated `create_tool_backend()` to support `modal` mode
|
2. `tinker-atropos/configs/gsm8k_agent.yaml` - Config for Qwen3-4B-Instruct training
|
||||||
3. `tools/terminal_tool.py` - Native Modal Sandbox integration with:
|
|
||||||
- `ModalProfile` config + YAML loading
|
|
||||||
- `_ModalSandboxPool` (sync, thread-based for CLI use)
|
|
||||||
- `_ModalPoolManager` (singleton, multi-profile)
|
|
||||||
- `_ModalSandboxEnvironment` replacing old `_ModalEnvironment`
|
|
||||||
4. `docs/MODAL_BACKEND.md` - Comprehensive documentation
|
|
||||||
5. `modal_profiles.yaml.example` - Example profiles config
|
|
||||||
6. `tests/test_modal_integration.py` - Integration tests
|
|
||||||
7. `tests/test_modal_stress.py` - Stress tests
|
|
||||||
8. `tests/test_modal_terminal.py` - Terminal tool tests
|
|
||||||
|
|
||||||
**What I fixed after merge:**
|
**Dependencies Updated:**
|
||||||
1. `atropos/envs/agent_env.py` - Replaced old stub Modal fields with proper config fields matching `ModalSandboxConfig.from_agent_env_config()`:
|
- `pyproject.toml` `[atropos]` extra now includes: tinker SDK, torch, wandb, math-verify
|
||||||
- `modal_image`, `modal_gpu`, `modal_cpu`, `modal_memory`
|
- Installed: tinker 0.12.0, tinker-atropos 0.1.0, torch (CPU)
|
||||||
- `modal_slots_per_sandbox`, `modal_min_sandboxes`, `modal_max_sandboxes`
|
|
||||||
- `modal_idle_timeout`, `modal_max_lifetime`
|
|
||||||
- `modal_acquire_timeout`, `modal_execution_timeout`
|
|
||||||
- `modal_secrets`, `modal_env_vars`, `modal_workspace_base`
|
|
||||||
2. `atropos/backends/modal_backend.py` - Guarded `yaml` import with try/except
|
|
||||||
|
|
||||||
**Key Architecture Decisions:**
|
**README Updated:**
|
||||||
- Uses **Modal Sandboxes** (not Functions) - long-lived containers that stay hot
|
- Added comprehensive "RL Training with Tinker" section with architecture diagram, quick start, config docs
|
||||||
- Uses `sandbox.exec()` directly instead of HTTP/sandbox_server.py - simpler approach
|
- Added TINKER_API_KEY and WANDB_API_KEY to optional keys table
|
||||||
- Slot-based multiplexing matching Nomad's pattern
|
|
||||||
- Multi-profile support for heterogeneous workloads (CPU vs GPU)
|
|
||||||
- Named sandbox recovery for resilience
|
|
||||||
- Modal SDK v1.3.2 compatible
|
|
||||||
|
|
||||||
## Previous Work (Feb 6, 2026)
|
**Verified Working:**
|
||||||
### Singularity/Apptainer Sandbox Integration - FULLY WORKING
|
- Tinker SDK connection ✅
|
||||||
See progress.md for details.
|
- All imports (tinker, tinker_atropos, trainer, environment) ✅
|
||||||
|
- Python REPL execution + tool call parsing ✅
|
||||||
|
- Math verification ✅
|
||||||
|
- Atropos run-api (port 8000) ✅
|
||||||
|
- Tinker trainer starts, loads config, creates inference server (port 8001) ✅
|
||||||
|
|
||||||
## Usage
|
**Blocked:** Tinker billing (402 error) - user's payment didn't process (possibly regional card issue)
|
||||||
|
|
||||||
### Modal Backend (Atropos):
|
### Main Branch Merge (Feb 9, 2026)
|
||||||
```bash
|
Merged `origin/main` into `atropos-integrations` - 22,560 lines, 79 files, 5 conflicts resolved.
|
||||||
python -m atropos.envs.swe_smith_oracle_env process \
|
|
||||||
--env.tool_pool_mode modal \
|
### Modal Backend (Feb 8, 2026)
|
||||||
--env.modal_image python:3.11 \
|
Merged modal-integration branch, working with Modal Sandboxes.
|
||||||
--env.modal_slots_per_sandbox 10 \
|
|
||||||
--env.modal_max_sandboxes 5
|
### Singularity/Apptainer (Feb 6, 2026)
|
||||||
|
Completed and tested.
|
||||||
|
|
||||||
|
## Architecture: Training Pipeline
|
||||||
|
|
||||||
|
```
|
||||||
|
Terminal 1: run-api (port 8000) - Atropos Rollout API
|
||||||
|
Terminal 2: launch_training.py (port 8001) - Tinker Trainer + FastAPI inference
|
||||||
|
Terminal 3: gsm8k_agent.py serve - Environment (generates trajectories)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Modal Terminal Tool (CLI):
|
The agent env gets math problems → model calls Python REPL tool → scores answer → sends to Atropos → Tinker does LoRA training → updates sampling weights → repeat.
|
||||||
```bash
|
|
||||||
export TERMINAL_ENV=modal
|
|
||||||
export TERMINAL_MODAL_IMAGE=python:3.11
|
|
||||||
./hermes
|
|
||||||
```
|
|
||||||
|
|
||||||
### With GPU Profile:
|
|
||||||
```bash
|
|
||||||
# In modal_profiles.yaml
|
|
||||||
profiles:
|
|
||||||
pytorch-gpu:
|
|
||||||
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
|
|
||||||
gpu: T4
|
|
||||||
memory: 16384
|
|
||||||
```
|
|
||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
- Live test Modal backend with actual Modal credentials
|
- [ ] Resolve Tinker billing to test full training loop
|
||||||
- Test multi-profile GPU workflows
|
- [ ] Run GSM8k agent training for ~20 steps (proof of concept)
|
||||||
- Test sandbox recovery after restart
|
- [ ] Monitor WandB for reward improvement
|
||||||
- Integrate with SWE-smith-oracle env for full GRPO training loop
|
- [ ] Graduate to more complex agent envs (SWE tasks with Modal backend)
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue