hermes-agent/tests/docker/test_zombie_reaping.py
Ben 244d62ded3 test(docker): lock baseline behavior for Phase 0 harness
Tasks 0.2-0.6 of the s6-overlay supervision plan. Locks the
user-visible behavior we must preserve through the Phase 2 init-
system swap:

- test_main_invocation.py (Task 0.2): docker run <image> with no
  args, chat subcommand passthrough, bare executable passthrough,
  bash pattern, exit-code propagation
- test_tui_passthrough.py (Task 0.3): TTY allocation via docker -t
  using the host's script(1) for a PTY
- test_dashboard.py (Task 0.4): HERMES_DASHBOARD=1 opt-in,
  HERMES_DASHBOARD_PORT override
- test_profile_gateway.py (Task 0.5): per-profile gateway
  start/stop and profile-delete-stops-gateway. Both marked
  xfail(strict=True) because the current tini image refuses
  gateway lifecycle commands inside the container; Phase 4
  Task 4.3 flips them to passing.
- test_zombie_reaping.py (Task 0.6): PID 1 reaps orphaned
  zombies. tini does this today; s6-overlay's /init must
  continue to.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-22 11:46:52 +10:00

44 lines
1.5 KiB
Python

"""Harness: PID 1 must reap orphaned zombie processes.
tini (current PID 1) reaps zombies via its built-in subreaper behavior.
s6-overlay's ``/init`` (Phase 2 PID 1) does the same. This invariant is
required for long-running containers spawning subprocesses (subagents,
dashboard, dynamic gateways) — otherwise the process table fills with
defunct entries and eventually exhausts the kernel PID space.
"""
from __future__ import annotations
import subprocess
import time
def test_orphan_zombies_reaped(
built_image: str, container_name: str,
) -> None:
"""Spawn an orphan child that exits immediately. PID 1 must reap it."""
subprocess.run(
["docker", "run", "-d", "--name", container_name, built_image,
"sleep", "60"],
check=True, capture_output=True, timeout=30,
)
time.sleep(2)
# `( ( sleep 0.1 & ) & ); sleep 1` creates a grandchild detached from
# the original docker exec session — it becomes an orphan reparented
# to PID 1 in the container. When it exits, PID 1 must reap it.
subprocess.run(
["docker", "exec", container_name, "sh", "-c",
"( ( sleep 0.1 & ) & ); sleep 1"],
capture_output=True, text=True, timeout=10,
)
time.sleep(1)
r = subprocess.run(
["docker", "exec", container_name, "ps", "axo", "stat,pid,comm"],
capture_output=True, text=True, timeout=10,
)
zombies = [
line for line in r.stdout.split("\n")
if line.strip().startswith("Z")
]
assert not zombies, f"Zombies not reaped by PID 1: {zombies}"