hermes-agent/environments/benchmarks/terminalbench_2
Blake Johnson c6df39955c fix: limit concurrent Modal sandbox creations to avoid deadlocks
- Add max_concurrent_tasks config (default 8) with semaphore in TB2 eval
- Pass cwd: /app via register_task_env_overrides for TB2 tasks
- Add /home/ to host path prefixes as safety net for container backends

When all 86 TerminalBench2 tasks fire simultaneously, each creates a Modal sandbox
via asyncio.run() inside a thread pool worker. Modal's blocking calls deadlock
when too many are created at once. The semaphore ensures max 8 concurrent creations.

Co-Authored-By: hermes-agent[bot] <hermes-agent[bot]@users.noreply.github.com>
2026-03-07 14:02:34 -08:00
..
__init__.py Add new environments and enhance tool context functionality 2026-02-10 19:39:05 +00:00
default.yaml fix: limit concurrent Modal sandbox creations to avoid deadlocks 2026-03-07 14:02:34 -08:00
run_eval.sh feat: add OpenThoughts-TBLite evaluation script 2026-03-04 12:55:56 +00:00
terminalbench2_env.py fix: limit concurrent Modal sandbox creations to avoid deadlocks 2026-03-07 14:02:34 -08:00