From 9dc27880cd76dda31e62bddc3f4284e012263ce9 Mon Sep 17 00:00:00 2001 From: Shannon Sands Date: Mon, 9 Feb 2026 02:37:39 +0000 Subject: [PATCH] adding tinker but need api key --- memory-bank/activeContext.md | 114 ++++++++++++++--------------------- 1 file changed, 46 insertions(+), 68 deletions(-) diff --git a/memory-bank/activeContext.md b/memory-bank/activeContext.md index 15fff012c21..b7858c2b131 100644 --- a/memory-bank/activeContext.md +++ b/memory-bank/activeContext.md @@ -1,83 +1,61 @@ # Active Context ## Current Focus -Modal backend integration has been **MERGED AND UPDATED** from the `modal-integration` branch. +Tinker RL training integration - pipeline fully wired up, waiting on Tinker billing to test. -## Recently Completed (Feb 8, 2026) +## Recently Completed (Feb 9, 2026) -### Modal Backend Integration - MERGED & WORKING -Merged the `modal-integration` branch into `atropos-integrations` and fixed integration issues. +### Tinker RL Training Integration +Created a complete agent training pipeline using Tinker (Thinking Machines) + Atropos: -**What was merged (from another dev's branch):** -1. `atropos/backends/modal_backend.py` - Complete Modal backend with: - - `ModalSandboxConfig` - Unified config with YAML profiles, env vars, and AgentEnv config loading - - `_ModalSandboxWithSlots` - Modal Sandbox wrapper with slot-based multiplexing - - `_ModalSandboxPool` - Auto-scaling pool of Modal sandboxes - - `_ModalMultiProfileManager` - Multi-profile support (CPU, GPU, high-memory) - - `ModalToolBackend` - Full ToolBackend implementation -2. `atropos/backends/__init__.py` - Updated `create_tool_backend()` to support `modal` mode -3. `tools/terminal_tool.py` - Native Modal Sandbox integration with: - - `ModalProfile` config + YAML loading - - `_ModalSandboxPool` (sync, thread-based for CLI use) - - `_ModalPoolManager` (singleton, multi-profile) - - `_ModalSandboxEnvironment` replacing old `_ModalEnvironment` -4. `docs/MODAL_BACKEND.md` - Comprehensive documentation -5. `modal_profiles.yaml.example` - Example profiles config -6. `tests/test_modal_integration.py` - Integration tests -7. `tests/test_modal_stress.py` - Stress tests -8. `tests/test_modal_terminal.py` - Terminal tool tests +**New Files Created:** +1. `tinker-atropos/tinker_atropos/environments/gsm8k_agent.py` - Agent GSM8k environment with: + - Python REPL tool calling (Hermes-style `` format) + - Multi-step agent loop within `collect_trajectories()` + - Math answer verification via `math_verify` + - Subprocess-based Python execution + - WandB metrics (percent_correct, tool_use_rate) +2. `tinker-atropos/configs/gsm8k_agent.yaml` - Config for Qwen3-4B-Instruct training -**What I fixed after merge:** -1. `atropos/envs/agent_env.py` - Replaced old stub Modal fields with proper config fields matching `ModalSandboxConfig.from_agent_env_config()`: - - `modal_image`, `modal_gpu`, `modal_cpu`, `modal_memory` - - `modal_slots_per_sandbox`, `modal_min_sandboxes`, `modal_max_sandboxes` - - `modal_idle_timeout`, `modal_max_lifetime` - - `modal_acquire_timeout`, `modal_execution_timeout` - - `modal_secrets`, `modal_env_vars`, `modal_workspace_base` -2. `atropos/backends/modal_backend.py` - Guarded `yaml` import with try/except +**Dependencies Updated:** +- `pyproject.toml` `[atropos]` extra now includes: tinker SDK, torch, wandb, math-verify +- Installed: tinker 0.12.0, tinker-atropos 0.1.0, torch (CPU) -**Key Architecture Decisions:** -- Uses **Modal Sandboxes** (not Functions) - long-lived containers that stay hot -- Uses `sandbox.exec()` directly instead of HTTP/sandbox_server.py - simpler approach -- Slot-based multiplexing matching Nomad's pattern -- Multi-profile support for heterogeneous workloads (CPU vs GPU) -- Named sandbox recovery for resilience -- Modal SDK v1.3.2 compatible +**README Updated:** +- Added comprehensive "RL Training with Tinker" section with architecture diagram, quick start, config docs +- Added TINKER_API_KEY and WANDB_API_KEY to optional keys table -## Previous Work (Feb 6, 2026) -### Singularity/Apptainer Sandbox Integration - FULLY WORKING -See progress.md for details. +**Verified Working:** +- Tinker SDK connection ✅ +- All imports (tinker, tinker_atropos, trainer, environment) ✅ +- Python REPL execution + tool call parsing ✅ +- Math verification ✅ +- Atropos run-api (port 8000) ✅ +- Tinker trainer starts, loads config, creates inference server (port 8001) ✅ -## Usage +**Blocked:** Tinker billing (402 error) - user's payment didn't process (possibly regional card issue) -### Modal Backend (Atropos): -```bash -python -m atropos.envs.swe_smith_oracle_env process \ - --env.tool_pool_mode modal \ - --env.modal_image python:3.11 \ - --env.modal_slots_per_sandbox 10 \ - --env.modal_max_sandboxes 5 +### Main Branch Merge (Feb 9, 2026) +Merged `origin/main` into `atropos-integrations` - 22,560 lines, 79 files, 5 conflicts resolved. + +### Modal Backend (Feb 8, 2026) +Merged modal-integration branch, working with Modal Sandboxes. + +### Singularity/Apptainer (Feb 6, 2026) +Completed and tested. + +## Architecture: Training Pipeline + +``` +Terminal 1: run-api (port 8000) - Atropos Rollout API +Terminal 2: launch_training.py (port 8001) - Tinker Trainer + FastAPI inference +Terminal 3: gsm8k_agent.py serve - Environment (generates trajectories) ``` -### Modal Terminal Tool (CLI): -```bash -export TERMINAL_ENV=modal -export TERMINAL_MODAL_IMAGE=python:3.11 -./hermes -``` - -### With GPU Profile: -```bash -# In modal_profiles.yaml -profiles: - pytorch-gpu: - image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime - gpu: T4 - memory: 16384 -``` +The agent env gets math problems → model calls Python REPL tool → scores answer → sends to Atropos → Tinker does LoRA training → updates sampling weights → repeat. ## Next Steps -- Live test Modal backend with actual Modal credentials -- Test multi-profile GPU workflows -- Test sandbox recovery after restart -- Integrate with SWE-smith-oracle env for full GRPO training loop +- [ ] Resolve Tinker billing to test full training loop +- [ ] Run GSM8k agent training for ~20 steps (proof of concept) +- [ ] Monitor WandB for reward improvement +- [ ] Graduate to more complex agent envs (SWE tasks with Modal backend)