hermes-agent/environments/benchmarks/terminalbench_2
dmahan93 136a64942d feat: add eval_concurrency limit + Docker local config for TBLite
- Add eval_concurrency config field with asyncio.Semaphore
- Add local.yaml config using Docker backend (sandboxed, no cloud costs)
- Register docker_image alongside modal_image for backend flexibility
- Default: 8 parallel tasks for local runs
2026-03-09 20:28:28 -05:00
..
__init__.py Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00
default.yaml Enhance TerminalBench 2 configuration and evaluation handling 2026-02-10 22:53:24 +00:00
run_eval.sh feat: add OpenThoughts-TBLite evaluation script 2026-03-04 12:55:56 +00:00
terminalbench2_env.py feat: add eval_concurrency limit + Docker local config for TBLite 2026-03-09 20:28:28 -05:00