fix(docker): --init for zombie reaping + sleep infinity for idle-based lifetime

Two issues with sandbox container spawning:

1. PID 1 was `sleep 2h` which doesn't call wait() — every background
   process that exited became a zombie (<defunct>), and the process
   tool reported them as "running" because zombie PIDs still exist in
   the process table. Fix: add --init to docker run, which uses
   tini (Docker) or catatonit (Podman) as PID 1 to reap children
   automatically. Both runtimes support --init natively.

2. The fixed 2-hour lifetime was arbitrary and sometimes too short
   for long agent sessions. Fix: replace 'sleep 2h' with
   'sleep infinity'. The idle reaper (_cleanup_inactive_envs, gated
   by terminal.lifetime_seconds, default 300s) already handles
   cleanup based on last activity timestamp — there's no need for
   the container itself to have a fixed death timer.

Fixes #6908.
This commit is contained in:
angelos 2026-04-10 03:17:40 +00:00 committed by Teknium
parent 2b0912ab18
commit 8254b820ec

View file

@ -409,11 +409,12 @@ class DockerEnvironment(BaseEnvironment):
container_name = f"hermes-{uuid.uuid4().hex[:8]}" container_name = f"hermes-{uuid.uuid4().hex[:8]}"
run_cmd = [ run_cmd = [
self._docker_exe, "run", "-d", self._docker_exe, "run", "-d",
"--init", # tini/catatonit as PID 1 — reaps zombie children
"--name", container_name, "--name", container_name,
"-w", cwd, "-w", cwd,
*all_run_args, *all_run_args,
image, image,
"sleep", "2h", "sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
] ]
logger.debug(f"Starting container: {' '.join(run_cmd)}") logger.debug(f"Starting container: {' '.join(run_cmd)}")
result = subprocess.run( result = subprocess.run(