docs: add Environments, Benchmarks & Data Generation guide

Comprehensive developer guide covering:
- Architecture (BaseEnv → HermesAgentBaseEnv → concrete envs)
- All three benchmarks (TerminalBench2, TBLite, YC-Bench)
- Training environments (TerminalTestEnv, HermesSweEnv)
- Core components (AgentLoop, ToolContext, Tool Call Parsers)
- Two-phase operation (Phase 1 OpenAI, Phase 2 VLLM)
- Running environments (evaluate, process, serve modes)
- Creating new environments (training + eval-only)
- Configuration reference and prerequisites

Also updates environments/README.md directory tree to include
TBLite and YC-Bench benchmarks.
This commit is contained in:
teknium1 2026-03-06 23:31:45 -08:00
parent f55f625277
commit 55a21fe37b
3 changed files with 509 additions and 2 deletions

View file

@ -195,8 +195,12 @@ environments/
│ └── hermes_swe_env.py
└── benchmarks/ # Evaluation benchmarks
└── terminalbench_2/
└── terminalbench2_env.py
├── terminalbench_2/ # 89 terminal tasks, Modal sandboxes
│ └── terminalbench2_env.py
├── tblite/ # 100 calibrated tasks (fast TB2 proxy)
│ └── tblite_env.py
└── yc_bench/ # Long-horizon strategic benchmark
└── yc_bench_env.py
```
## Concrete Environments