feat(terminal): collapse subagent task_ids to shared container (#16177)

Before: delegate_task children each allocated their own terminal
sandbox keyed by child task_id. Starting extra containers (or Modal
sandboxes / Daytona workspaces) is expensive, and the subagent's work
is invisible to the parent — files written by the child in its
container don't exist in the parent's when the subagent returns.

After: a single `_resolve_container_task_id` helper maps any
tool-call task_id to "default" UNLESS an env override is registered
for it. The parent agent and all delegate_task children therefore
share one long-lived sandbox — installed packages, cwd, /workspace
files, and /tmp scratch carry over freely between them.

RL and benchmark environments (TerminalBench2, HermesSweEnv, ...)
opt in to isolation via `register_task_env_overrides(task_id, {...})`;
those task_ids survive the collapse and get their own sandbox,
preserving the per-task Docker image behavior these benchmarks rely on.

file_state / active-subagents registry / TUI events still key off the
original child task_id, so the 'subagent wrote a file the parent read'
warning and UI per-subagent panels keep working.

Tradeoff: parallel delegate_task children (tasks=[...]) now share one
bash/container. Concurrent cd, env-var mutations, and writes to the
same path will collide. If that bites a specific workflow, the
subagent can opt back into isolation via register_task_env_overrides.

Applied at four lookup sites:
- tools/terminal_tool.py terminal_tool() and get_active_env()
- tools/file_tools.py _get_file_ops() and _get_live_tracking_cwd()
- tools/code_execution_tool.py _get_or_create_environment()

Docs: website/docs/user-guide/configuration.md updated to reflect the
shared-container reality and document the RL/benchmark carve-out.
Tests: tests/tools/test_shared_container_task_id.py (9 cases).
This commit is contained in:
Teknium 2026-04-26 11:55:02 -07:00 committed by GitHub
parent 087e74d4d7
commit 5b2c59559a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 159 additions and 8 deletions

View file

@ -88,8 +88,14 @@ def _resolve_path(filepath: str, task_id: str = "default") -> Path:
def _get_live_tracking_cwd(task_id: str = "default") -> str | None:
"""Return the task's live terminal cwd for bookkeeping when available."""
try:
from tools.terminal_tool import _resolve_container_task_id
container_key = _resolve_container_task_id(task_id)
except Exception:
container_key = task_id
with _file_ops_lock:
cached = _file_ops_cache.get(task_id)
cached = _file_ops_cache.get(container_key) or _file_ops_cache.get(task_id)
if cached is not None:
live_cwd = getattr(getattr(cached, "env", None), "cwd", None) or getattr(
cached, "cwd", None
@ -101,7 +107,7 @@ def _get_live_tracking_cwd(task_id: str = "default") -> str | None:
from tools.terminal_tool import _active_environments, _env_lock
with _env_lock:
env = _active_environments.get(task_id)
env = _active_environments.get(container_key) or _active_environments.get(task_id)
live_cwd = getattr(env, "cwd", None) if env is not None else None
if live_cwd:
return live_cwd
@ -261,15 +267,23 @@ def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
Thread-safe: uses the same per-task creation locks as terminal_tool to
prevent duplicate sandbox creation from concurrent tool calls.
Note: subagent task_ids are collapsed to "default" via
``_resolve_container_task_id`` so delegate_task children share the
parent's container and its cached file_ops. RL/benchmark task_ids with
a registered env override keep their isolation.
"""
from tools.terminal_tool import (
_active_environments, _env_lock, _create_environment,
_get_env_config, _last_activity, _start_cleanup_thread,
_creation_locks,
_creation_locks_lock,
_resolve_container_task_id,
)
import time
task_id = _resolve_container_task_id(task_id)
# Fast path: check cache -- but also verify the underlying environment
# is still alive (it may have been killed by the cleanup thread).
with _file_ops_lock: