fix: limit concurrent Modal sandbox creations to avoid deadlocks

- Add max_concurrent_tasks config (default 8) with semaphore in TB2 eval
- Pass cwd: /app via register_task_env_overrides for TB2 tasks
- Add /home/ to host path prefixes as safety net for container backends

When all 86 TerminalBench2 tasks fire simultaneously, each creates a Modal sandbox
via asyncio.run() inside a thread pool worker. Modal's blocking calls deadlock
when too many are created at once. The semaphore ensures max 8 concurrent creations.

Co-Authored-By: hermes-agent[bot] <hermes-agent[bot]@users.noreply.github.com>
This commit is contained in:
Blake Johnson 2026-03-07 21:34:06 +00:00
parent 306d92a9d7
commit c6df39955c
4 changed files with 32 additions and 3 deletions

View file

@ -424,7 +424,8 @@ def _get_env_config() -> Dict[str, Any]:
# SSH is excluded since /home/ paths are valid on remote machines.
cwd = os.getenv("TERMINAL_CWD", default_cwd)
if env_type in ("modal", "docker", "singularity", "daytona") and cwd:
host_prefixes = ("/Users/", "C:\\", "C:/")
# Host paths that won't exist inside containers
host_prefixes = ("/Users/", "/home/", "C:\\", "C:/")
if any(cwd.startswith(p) for p in host_prefixes) and cwd != default_cwd:
logger.info("Ignoring TERMINAL_CWD=%r for %s backend "
"(host path won't exist in sandbox). Using %r instead.",