mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
fix(service_manager): s6 detection works for unprivileged hermes user
PR #30136 review surfaced two issues, both rooted in the same audit gap: docker integration tests were running as root, not the unprivileged `hermes` user (UID 10000) that the runtime actually uses via `s6-setuidgid hermes`. Anything that probed PID-1 state or wrote to the s6 control surface worked as root in the tests but was inert in production. Fixes: 1. `_s6_running()` previously called `Path("/proc/1/exe").resolve()`, which is root-only readable. For UID 10000 the symlink yields PermissionError, `resolve()` silently returns the unresolved path, and `exe.name == "exe"` — so detection always returned False, the service-manager runtime-registration path was inert, and every `hermes profile create` / `hermes -p X gateway start` silently skipped the s6 hook. Replace with `/proc/1/comm` (world-readable) + `/run/s6/basedir` (s6-overlay-specific) — both required, fail closed. 2. `02-reconcile-profiles` now also chowns `/run/service/.s6-svscan/` {control,lock} to hermes so `s6-svscanctl -a/-an` works without root. Previously the directory chown stopped at `/run/service` and the FIFO inside stayed root-owned, so `register_profile_gateway` from hermes failed at the rescan-trigger step with EACCES — the wrapper in profiles.py caught the exception and printed a swallowed warning, so profile creation appeared to succeed while the slot was rolled back. Audit changes to flush this class of bug next time: - Add `docker_exec` / `docker_exec_sh` helpers to `tests/docker/conftest.py` that default to `-u hermes`. The module docstring explains why and flags `user="root"` as opt-in only for tests that explicitly need root (none currently do). - Refactor every `docker exec` call in tests/docker/ through the new helpers (test_dashboard.py, test_zombie_reaping.py, test_profile_gateway.py, test_container_restart.py, test_s6_profile_gateway_integration.py). - Add 5 unit tests covering `_s6_running` under various probe states (both signals present; comm wrong; basedir missing; PermissionError on /proc/1/comm; missing /proc — non-Linux). The PermissionError test is the explicit regression guard for the original bug. Known follow-up: the per-service `supervise/control` FIFO inside each `/run/service/gateway-<profile>/supervise/` is created root-owned by s6-supervise (which runs as root because s6-svscan is PID 1). `s6-svc -u/-d/-t` from the hermes user will get EACCES on those. The audit under `-u hermes` will reveal this in lifecycle tests — surfacing the issue cleanly so it can be fixed in a focused follow-up (likely via a small SUID helper or a polling chown loop in cont-init.d). The detection + svscanctl fixes here are independent and complete on their own.
This commit is contained in:
parent
a6f7171a5e
commit
2f8ceeab9a
9 changed files with 241 additions and 53 deletions
|
|
@ -16,15 +16,31 @@
|
||||||
#
|
#
|
||||||
# Phase 4 also needs hermes-user writes to /run/service/ (so the
|
# Phase 4 also needs hermes-user writes to /run/service/ (so the
|
||||||
# profile create/delete hooks can register/unregister at runtime),
|
# profile create/delete hooks can register/unregister at runtime),
|
||||||
# so we chown the scandir before invoking the reconciler. The
|
# so we chown the scandir before invoking the reconciler. We
|
||||||
# .s6-svscan/ subdir stays root-owned; only sibling directories
|
# additionally chown the s6-svscan control FIFO so the hermes user
|
||||||
# (gateway-<profile>/) need to be hermes-writable.
|
# can send rescan signals via ``s6-svscanctl -a``; without this the
|
||||||
|
# entire runtime-registration path is inert under UID 10000 (the
|
||||||
|
# Python wrapper catches the resulting EACCES, prints a warning,
|
||||||
|
# and swallows the failure).
|
||||||
set -e
|
set -e
|
||||||
|
|
||||||
# Make the dynamic scandir hermes-writable. The directory itself
|
# Make the dynamic scandir hermes-writable. The directory itself
|
||||||
# starts root-owned by s6-overlay; we leave .s6-svscan/ alone since
|
# starts root-owned by s6-overlay.
|
||||||
# only s6 itself writes there.
|
|
||||||
chown hermes:hermes /run/service 2>/dev/null || true
|
chown hermes:hermes /run/service 2>/dev/null || true
|
||||||
|
|
||||||
|
# Make the svscan control FIFO hermes-writable so s6-svscanctl -a
|
||||||
|
# / -an work for the hermes user. The FIFO is created by s6-svscan
|
||||||
|
# at PID-1 startup, so by the time this cont-init.d script runs it
|
||||||
|
# already exists. Both ``control`` and ``lock`` need to be writable
|
||||||
|
# for the various svscanctl operations; the directory itself stays
|
||||||
|
# root-owned (we only need to touch the two FIFOs/locks inside).
|
||||||
|
if [ -d /run/service/.s6-svscan ]; then
|
||||||
|
for entry in control lock; do
|
||||||
|
if [ -e "/run/service/.s6-svscan/$entry" ]; then
|
||||||
|
chown hermes:hermes "/run/service/.s6-svscan/$entry" 2>/dev/null || true
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
exec s6-setuidgid hermes /opt/hermes/.venv/bin/python -m hermes_cli.container_boot
|
exec s6-setuidgid hermes /opt/hermes/.venv/bin/python -m hermes_cli.container_boot
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -122,16 +122,34 @@ def detect_service_manager() -> ServiceManagerKind:
|
||||||
def _s6_running() -> bool:
|
def _s6_running() -> bool:
|
||||||
"""True when s6-svscan is running as PID 1 in this container.
|
"""True when s6-svscan is running as PID 1 in this container.
|
||||||
|
|
||||||
s6-overlay's /init exec's s6-svscan, so ``/proc/1/exe`` resolves
|
Detection has to work for **both** root and the unprivileged hermes
|
||||||
to it (or to ``init`` on some kernel configurations that hide the
|
user (UID 10000). The obvious probe — ``Path('/proc/1/exe').resolve()``
|
||||||
exe link). The ``/run/s6/`` directory is created by stage1, so its
|
— only works as root: for any other UID, the symlink at
|
||||||
presence is a second necessary signal.
|
``/proc/1/exe`` is unreadable and ``resolve()`` silently returns the
|
||||||
|
path unchanged, so the resolved name is the literal ``"exe"`` and
|
||||||
|
detection always fails. Since every Hermes runtime call inside the
|
||||||
|
container drops to hermes via ``s6-setuidgid``, that silent failure
|
||||||
|
made the entire service-manager runtime-registration path inert in
|
||||||
|
production (PR #30136 review).
|
||||||
|
|
||||||
|
Probe instead via:
|
||||||
|
* ``/proc/1/comm`` — world-readable, contains the process comm
|
||||||
|
(``s6-svscan`` when s6-overlay is PID 1).
|
||||||
|
* ``/run/s6/basedir`` — s6-overlay-specific directory created by
|
||||||
|
stage1. World-readable. More specific than ``/run/s6`` (which
|
||||||
|
other tools occasionally create).
|
||||||
|
|
||||||
|
Both signals are required; either alone could false-positive
|
||||||
|
(e.g. a container with the s6 binaries installed but a different
|
||||||
|
init, or an unrelated process named ``s6-svscan``).
|
||||||
"""
|
"""
|
||||||
try:
|
try:
|
||||||
exe = Path("/proc/1/exe").resolve()
|
comm = Path("/proc/1/comm").read_text().strip()
|
||||||
return exe.name in ("s6-svscan", "init") and Path("/run/s6").exists()
|
except OSError:
|
||||||
except (OSError, RuntimeError):
|
|
||||||
return False
|
return False
|
||||||
|
if comm != "s6-svscan":
|
||||||
|
return False
|
||||||
|
return Path("/run/s6/basedir").is_dir()
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
|
||||||
|
|
@ -84,3 +84,56 @@ def container_name(request) -> Iterator[str]:
|
||||||
["docker", "rm", "-f", name],
|
["docker", "rm", "-f", name],
|
||||||
capture_output=True, timeout=10,
|
capture_output=True, timeout=10,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# docker_exec — default to the unprivileged hermes user
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
#
|
||||||
|
# Background: every Hermes runtime path inside the container drops to UID
|
||||||
|
# 10000 (the ``hermes`` user) via ``s6-setuidgid hermes``. ``docker exec``
|
||||||
|
# without ``-u`` runs as root, which is **not** representative of how
|
||||||
|
# production code executes. PR #30136 review caught a real regression
|
||||||
|
# this way — ``Path('/proc/1/exe').resolve()`` works as root and silently
|
||||||
|
# fails (PermissionError swallowed) for hermes, so a test that ran as root
|
||||||
|
# couldn't catch a feature that was inert for the actual runtime user.
|
||||||
|
#
|
||||||
|
# Tests in this directory MUST exercise the realistic user context. The
|
||||||
|
# helpers below run every probe under ``-u hermes`` unless a specific
|
||||||
|
# test explicitly opts into ``user="root"`` (rare — e.g. inspecting
|
||||||
|
# /proc/1/exe itself, chowning a volume).
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def docker_exec(
|
||||||
|
container: str,
|
||||||
|
*args: str,
|
||||||
|
user: str = "hermes",
|
||||||
|
timeout: int = 30,
|
||||||
|
extra_docker_args: tuple[str, ...] = (),
|
||||||
|
) -> subprocess.CompletedProcess[str]:
|
||||||
|
"""Run a command inside ``container`` as ``user`` (default: hermes).
|
||||||
|
|
||||||
|
Returns the CompletedProcess with text=True, capture_output=True.
|
||||||
|
|
||||||
|
Pass ``user="root"`` only when the test specifically needs root
|
||||||
|
capabilities (e.g. reading /proc/1/exe, manipulating ownership).
|
||||||
|
Most tests should use the default.
|
||||||
|
"""
|
||||||
|
cmd = ["docker", "exec", "-u", user, *extra_docker_args, container, *args]
|
||||||
|
return subprocess.run(
|
||||||
|
cmd, capture_output=True, text=True, timeout=timeout,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def docker_exec_sh(
|
||||||
|
container: str,
|
||||||
|
command: str,
|
||||||
|
*,
|
||||||
|
user: str = "hermes",
|
||||||
|
timeout: int = 30,
|
||||||
|
) -> subprocess.CompletedProcess[str]:
|
||||||
|
"""Run ``sh -c <command>`` inside the container as ``user``."""
|
||||||
|
return docker_exec(
|
||||||
|
container, "sh", "-c", command, user=user, timeout=timeout,
|
||||||
|
)
|
||||||
|
|
|
||||||
|
|
@ -9,6 +9,10 @@ auto-start only those whose last state was `running`.
|
||||||
These tests stand up a container with a named volume, create profiles
|
These tests stand up a container with a named volume, create profiles
|
||||||
inside it in various gateway states, restart the container, and
|
inside it in various gateway states, restart the container, and
|
||||||
assert the reconciler did the right thing.
|
assert the reconciler did the right thing.
|
||||||
|
|
||||||
|
Every ``docker exec`` here runs as the unprivileged ``hermes`` user
|
||||||
|
(via :func:`docker_exec` / :func:`docker_exec_sh` in conftest); see
|
||||||
|
the conftest module docstring.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
|
@ -17,6 +21,8 @@ import time
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
|
from tests.docker.conftest import docker_exec, docker_exec_sh
|
||||||
|
|
||||||
|
|
||||||
def _docker(*args: str, **kw) -> subprocess.CompletedProcess[str]:
|
def _docker(*args: str, **kw) -> subprocess.CompletedProcess[str]:
|
||||||
return subprocess.run(
|
return subprocess.run(
|
||||||
|
|
@ -27,11 +33,11 @@ def _docker(*args: str, **kw) -> subprocess.CompletedProcess[str]:
|
||||||
|
|
||||||
|
|
||||||
def _exec(container: str, *args: str, timeout: int = 30) -> subprocess.CompletedProcess[str]:
|
def _exec(container: str, *args: str, timeout: int = 30) -> subprocess.CompletedProcess[str]:
|
||||||
return _docker("exec", container, *args, timeout=timeout)
|
return docker_exec(container, *args, timeout=timeout)
|
||||||
|
|
||||||
|
|
||||||
def _sh(container: str, cmd: str, timeout: int = 30) -> subprocess.CompletedProcess[str]:
|
def _sh(container: str, cmd: str, timeout: int = 30) -> subprocess.CompletedProcess[str]:
|
||||||
return _docker("exec", container, "sh", "-c", cmd, timeout=timeout)
|
return docker_exec_sh(container, cmd, timeout=timeout)
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
|
|
|
||||||
|
|
@ -5,12 +5,18 @@ it stays dead. After Phase 2 (s6): dashboard starts once; if it crashes
|
||||||
it is restarted under supervision. The restart-after-crash test lives in
|
it is restarted under supervision. The restart-after-crash test lives in
|
||||||
Phase 2 Task 2.5; this file only locks the opt-in surface (which must
|
Phase 2 Task 2.5; this file only locks the opt-in surface (which must
|
||||||
not change between tini and s6).
|
not change between tini and s6).
|
||||||
|
|
||||||
|
Every ``docker exec`` here runs as the unprivileged ``hermes`` user
|
||||||
|
(via :func:`docker_exec`/:func:`docker_exec_sh` in conftest), matching
|
||||||
|
the realistic runtime context. See the conftest module docstring.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import subprocess
|
import subprocess
|
||||||
import time
|
import time
|
||||||
|
|
||||||
|
from tests.docker.conftest import docker_exec, docker_exec_sh
|
||||||
|
|
||||||
|
|
||||||
def _poll(container: str, probe: str, *, deadline_s: float = 30.0,
|
def _poll(container: str, probe: str, *, deadline_s: float = 30.0,
|
||||||
interval_s: float = 0.5) -> tuple[bool, str]:
|
interval_s: float = 0.5) -> tuple[bool, str]:
|
||||||
|
|
@ -19,10 +25,7 @@ def _poll(container: str, probe: str, *, deadline_s: float = 30.0,
|
||||||
end = time.monotonic() + deadline_s
|
end = time.monotonic() + deadline_s
|
||||||
last = ""
|
last = ""
|
||||||
while time.monotonic() < end:
|
while time.monotonic() < end:
|
||||||
r = subprocess.run(
|
r = docker_exec_sh(container, probe, timeout=10)
|
||||||
["docker", "exec", container, "sh", "-c", probe],
|
|
||||||
capture_output=True, text=True, timeout=10,
|
|
||||||
)
|
|
||||||
last = r.stdout
|
last = r.stdout
|
||||||
if r.returncode == 0:
|
if r.returncode == 0:
|
||||||
return True, last
|
return True, last
|
||||||
|
|
@ -42,11 +45,7 @@ def test_dashboard_not_running_by_default(
|
||||||
# Give the entrypoint enough time to finish bootstrap; if a dashboard
|
# Give the entrypoint enough time to finish bootstrap; if a dashboard
|
||||||
# were going to start it'd be visible by now.
|
# were going to start it'd be visible by now.
|
||||||
time.sleep(5)
|
time.sleep(5)
|
||||||
r = subprocess.run(
|
r = docker_exec(container_name, "pgrep", "-f", "hermes dashboard")
|
||||||
["docker", "exec", container_name,
|
|
||||||
"pgrep", "-f", "hermes dashboard"],
|
|
||||||
capture_output=True, text=True, timeout=10,
|
|
||||||
)
|
|
||||||
# pgrep exits non-zero when no match found
|
# pgrep exits non-zero when no match found
|
||||||
assert r.returncode != 0, (
|
assert r.returncode != 0, (
|
||||||
"Dashboard should not be running without HERMES_DASHBOARD"
|
"Dashboard should not be running without HERMES_DASHBOARD"
|
||||||
|
|
@ -121,10 +120,8 @@ def test_dashboard_restarts_after_crash(
|
||||||
# a couple of times before giving up.
|
# a couple of times before giving up.
|
||||||
first_pid: str | None = None
|
first_pid: str | None = None
|
||||||
for _attempt in range(10):
|
for _attempt in range(10):
|
||||||
first_pid_result = subprocess.run(
|
first_pid_result = docker_exec(
|
||||||
["docker", "exec", container_name,
|
container_name, "pgrep", "-f", "hermes dashboard",
|
||||||
"pgrep", "-f", "hermes dashboard"],
|
|
||||||
capture_output=True, text=True, timeout=10,
|
|
||||||
)
|
)
|
||||||
first_pids = first_pid_result.stdout.strip().split()
|
first_pids = first_pid_result.stdout.strip().split()
|
||||||
if first_pids:
|
if first_pids:
|
||||||
|
|
@ -133,21 +130,15 @@ def test_dashboard_restarts_after_crash(
|
||||||
time.sleep(0.5)
|
time.sleep(0.5)
|
||||||
assert first_pid is not None, "Could not capture initial dashboard PID"
|
assert first_pid is not None, "Could not capture initial dashboard PID"
|
||||||
|
|
||||||
# Kill the dashboard.
|
# Kill the dashboard. The dashboard process runs as hermes, so the
|
||||||
subprocess.run(
|
# hermes user can kill it (same UID).
|
||||||
["docker", "exec", container_name, "kill", "-9", first_pid],
|
docker_exec(container_name, "kill", "-9", first_pid)
|
||||||
capture_output=True, timeout=10,
|
|
||||||
)
|
|
||||||
|
|
||||||
# s6 backs off ~1s before restart; allow up to 15s for the new
|
# s6 backs off ~1s before restart; allow up to 15s for the new
|
||||||
# process to appear with a different PID.
|
# process to appear with a different PID.
|
||||||
deadline = time.monotonic() + 15.0
|
deadline = time.monotonic() + 15.0
|
||||||
while time.monotonic() < deadline:
|
while time.monotonic() < deadline:
|
||||||
r = subprocess.run(
|
r = docker_exec(container_name, "pgrep", "-f", "hermes dashboard")
|
||||||
["docker", "exec", container_name,
|
|
||||||
"pgrep", "-f", "hermes dashboard"],
|
|
||||||
capture_output=True, text=True, timeout=10,
|
|
||||||
)
|
|
||||||
pids = r.stdout.strip().split() if r.returncode == 0 else []
|
pids = r.stdout.strip().split() if r.returncode == 0 else []
|
||||||
if pids and pids[0] != first_pid:
|
if pids and pids[0] != first_pid:
|
||||||
return # success
|
return # success
|
||||||
|
|
|
||||||
|
|
@ -13,22 +13,25 @@ so the gateway process itself will exit with code 1 on every start
|
||||||
attempt (s6 will keep restarting it). We assert against s6's
|
attempt (s6 will keep restarting it). We assert against s6's
|
||||||
``want up`` / ``want down`` state — which reflects the lifecycle
|
``want up`` / ``want down`` state — which reflects the lifecycle
|
||||||
command's intent, not the supervised process's health.
|
command's intent, not the supervised process's health.
|
||||||
|
|
||||||
|
Every ``docker exec`` here runs as the unprivileged ``hermes`` user
|
||||||
|
(via :func:`docker_exec_sh` in conftest); see the conftest module
|
||||||
|
docstring.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import subprocess
|
import subprocess
|
||||||
import time
|
import time
|
||||||
|
|
||||||
|
from tests.docker.conftest import docker_exec_sh
|
||||||
|
|
||||||
PROFILE = "test-harness-profile"
|
PROFILE = "test-harness-profile"
|
||||||
|
|
||||||
|
|
||||||
def _sh(
|
def _sh(
|
||||||
container: str, command: str, timeout: int = 30,
|
container: str, command: str, timeout: int = 30,
|
||||||
) -> subprocess.CompletedProcess[str]:
|
) -> subprocess.CompletedProcess[str]:
|
||||||
return subprocess.run(
|
return docker_exec_sh(container, command, timeout=timeout)
|
||||||
["docker", "exec", container, "sh", "-c", command],
|
|
||||||
capture_output=True, text=True, timeout=timeout,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _svstat(container: str) -> str:
|
def _svstat(container: str) -> str:
|
||||||
|
|
|
||||||
|
|
@ -10,12 +10,20 @@ gateway actually starting (the binary will refuse to start without a
|
||||||
valid profile config). The full register → start → supervised-restart
|
valid profile config). The full register → start → supervised-restart
|
||||||
→ unregister cycle is covered by Phase 4 once profile create/delete
|
→ unregister cycle is covered by Phase 4 once profile create/delete
|
||||||
hooks land.
|
hooks land.
|
||||||
|
|
||||||
|
Every ``docker exec`` here runs as the unprivileged ``hermes`` user
|
||||||
|
(via :func:`docker_exec` in conftest); see the conftest module
|
||||||
|
docstring. ``/run/service`` is chowned hermes-writable by the
|
||||||
|
``02-reconcile-profiles`` cont-init.d script, so register/unregister
|
||||||
|
operations work correctly under UID 10000.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import subprocess
|
import subprocess
|
||||||
import time
|
import time
|
||||||
|
|
||||||
|
from tests.docker.conftest import docker_exec
|
||||||
|
|
||||||
|
|
||||||
_REGISTER_SCRIPT = """
|
_REGISTER_SCRIPT = """
|
||||||
import sys
|
import sys
|
||||||
|
|
@ -38,10 +46,7 @@ print("UNREGISTERED")
|
||||||
|
|
||||||
|
|
||||||
def _exec(container: str, *args: str, timeout: int = 30) -> subprocess.CompletedProcess:
|
def _exec(container: str, *args: str, timeout: int = 30) -> subprocess.CompletedProcess:
|
||||||
return subprocess.run(
|
return docker_exec(container, *args, timeout=timeout)
|
||||||
["docker", "exec", container, *args],
|
|
||||||
capture_output=True, text=True, timeout=timeout,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_s6_register_creates_service_dir_in_live_container(
|
def test_s6_register_creates_service_dir_in_live_container(
|
||||||
|
|
|
||||||
|
|
@ -5,12 +5,18 @@ s6-overlay's ``/init`` (Phase 2 PID 1) does the same. This invariant is
|
||||||
required for long-running containers spawning subprocesses (subagents,
|
required for long-running containers spawning subprocesses (subagents,
|
||||||
dashboard, dynamic gateways) — otherwise the process table fills with
|
dashboard, dynamic gateways) — otherwise the process table fills with
|
||||||
defunct entries and eventually exhausts the kernel PID space.
|
defunct entries and eventually exhausts the kernel PID space.
|
||||||
|
|
||||||
|
Every ``docker exec`` here runs as the unprivileged ``hermes`` user
|
||||||
|
(via :func:`docker_exec_sh` in conftest); see the conftest module
|
||||||
|
docstring.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import subprocess
|
import subprocess
|
||||||
import time
|
import time
|
||||||
|
|
||||||
|
from tests.docker.conftest import docker_exec, docker_exec_sh
|
||||||
|
|
||||||
|
|
||||||
def test_orphan_zombies_reaped(
|
def test_orphan_zombies_reaped(
|
||||||
built_image: str, container_name: str,
|
built_image: str, container_name: str,
|
||||||
|
|
@ -26,17 +32,12 @@ def test_orphan_zombies_reaped(
|
||||||
# `( ( sleep 0.1 & ) & ); sleep 1` creates a grandchild detached from
|
# `( ( sleep 0.1 & ) & ); sleep 1` creates a grandchild detached from
|
||||||
# the original docker exec session — it becomes an orphan reparented
|
# the original docker exec session — it becomes an orphan reparented
|
||||||
# to PID 1 in the container. When it exits, PID 1 must reap it.
|
# to PID 1 in the container. When it exits, PID 1 must reap it.
|
||||||
subprocess.run(
|
docker_exec_sh(
|
||||||
["docker", "exec", container_name, "sh", "-c",
|
container_name, "( ( sleep 0.1 & ) & ); sleep 1", timeout=10,
|
||||||
"( ( sleep 0.1 & ) & ); sleep 1"],
|
|
||||||
capture_output=True, text=True, timeout=10,
|
|
||||||
)
|
)
|
||||||
time.sleep(1)
|
time.sleep(1)
|
||||||
|
|
||||||
r = subprocess.run(
|
r = docker_exec(container_name, "ps", "axo", "stat,pid,comm")
|
||||||
["docker", "exec", container_name, "ps", "axo", "stat,pid,comm"],
|
|
||||||
capture_output=True, text=True, timeout=10,
|
|
||||||
)
|
|
||||||
zombies = [
|
zombies = [
|
||||||
line for line in r.stdout.split("\n")
|
line for line in r.stdout.split("\n")
|
||||||
if line.strip().startswith("Z")
|
if line.strip().startswith("Z")
|
||||||
|
|
|
||||||
|
|
@ -69,6 +69,101 @@ def test_detect_service_manager_returns_known_value() -> None:
|
||||||
assert result in ("systemd", "launchd", "windows", "s6", "none")
|
assert result in ("systemd", "launchd", "windows", "s6", "none")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# _s6_running — must work for unprivileged users, not just root
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _patch_s6_paths(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
*,
|
||||||
|
comm: str | OSError | None,
|
||||||
|
basedir_is_dir: bool,
|
||||||
|
) -> None:
|
||||||
|
"""Stub /proc/1/comm and /run/s6/basedir for _s6_running tests."""
|
||||||
|
from pathlib import Path as _Path
|
||||||
|
|
||||||
|
real_read_text = _Path.read_text
|
||||||
|
real_is_dir = _Path.is_dir
|
||||||
|
|
||||||
|
def fake_read_text(self, *args, **kwargs): # type: ignore[override]
|
||||||
|
if str(self) == "/proc/1/comm":
|
||||||
|
if isinstance(comm, OSError):
|
||||||
|
raise comm
|
||||||
|
if comm is None:
|
||||||
|
raise FileNotFoundError(2, "No such file or directory")
|
||||||
|
return comm + "\n"
|
||||||
|
return real_read_text(self, *args, **kwargs)
|
||||||
|
|
||||||
|
def fake_is_dir(self): # type: ignore[override]
|
||||||
|
if str(self) == "/run/s6/basedir":
|
||||||
|
return basedir_is_dir
|
||||||
|
return real_is_dir(self)
|
||||||
|
|
||||||
|
monkeypatch.setattr(_Path, "read_text", fake_read_text)
|
||||||
|
monkeypatch.setattr(_Path, "is_dir", fake_is_dir)
|
||||||
|
|
||||||
|
|
||||||
|
def test_s6_running_true_when_comm_and_basedir_match(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
from hermes_cli.service_manager import _s6_running
|
||||||
|
|
||||||
|
_patch_s6_paths(monkeypatch, comm="s6-svscan", basedir_is_dir=True)
|
||||||
|
assert _s6_running() is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_s6_running_false_when_comm_is_wrong(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
from hermes_cli.service_manager import _s6_running
|
||||||
|
|
||||||
|
# systemd as PID 1, basedir present from some stray s6 install
|
||||||
|
_patch_s6_paths(monkeypatch, comm="systemd", basedir_is_dir=True)
|
||||||
|
assert _s6_running() is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_s6_running_false_when_basedir_missing(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
from hermes_cli.service_manager import _s6_running
|
||||||
|
|
||||||
|
# The comm matches but the basedir is missing — e.g. an unrelated
|
||||||
|
# process happens to be named "s6-svscan"
|
||||||
|
_patch_s6_paths(monkeypatch, comm="s6-svscan", basedir_is_dir=False)
|
||||||
|
assert _s6_running() is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_s6_running_false_when_comm_unreadable(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
"""Regression: /proc/1/exe was unreadable to UID 10000 and
|
||||||
|
resolve() silently returned the unresolved path, making detection
|
||||||
|
always-False inside the container under the hermes user. The new
|
||||||
|
probe must FAIL CLOSED — not raise — when /proc/1/comm can't be
|
||||||
|
read.
|
||||||
|
"""
|
||||||
|
from hermes_cli.service_manager import _s6_running
|
||||||
|
|
||||||
|
_patch_s6_paths(
|
||||||
|
monkeypatch,
|
||||||
|
comm=PermissionError(13, "Permission denied"),
|
||||||
|
basedir_is_dir=True,
|
||||||
|
)
|
||||||
|
assert _s6_running() is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_s6_running_handles_missing_proc(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
"""On macOS / Windows / WSL-without-procfs, /proc/1/comm doesn't
|
||||||
|
exist. Must return False, not raise."""
|
||||||
|
from hermes_cli.service_manager import _s6_running
|
||||||
|
|
||||||
|
_patch_s6_paths(monkeypatch, comm=None, basedir_is_dir=False)
|
||||||
|
assert _s6_running() is False
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Backend wrappers — kind + registration unsupported on hosts
|
# Backend wrappers — kind + registration unsupported on hosts
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue