mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-05 07:41:39 +00:00
fix(service_manager): s6 detection works for unprivileged hermes user
PR #30136 review surfaced two issues, both rooted in the same audit gap: docker integration tests were running as root, not the unprivileged `hermes` user (UID 10000) that the runtime actually uses via `s6-setuidgid hermes`. Anything that probed PID-1 state or wrote to the s6 control surface worked as root in the tests but was inert in production. Fixes: 1. `_s6_running()` previously called `Path("/proc/1/exe").resolve()`, which is root-only readable. For UID 10000 the symlink yields PermissionError, `resolve()` silently returns the unresolved path, and `exe.name == "exe"` — so detection always returned False, the service-manager runtime-registration path was inert, and every `hermes profile create` / `hermes -p X gateway start` silently skipped the s6 hook. Replace with `/proc/1/comm` (world-readable) + `/run/s6/basedir` (s6-overlay-specific) — both required, fail closed. 2. `02-reconcile-profiles` now also chowns `/run/service/.s6-svscan/` {control,lock} to hermes so `s6-svscanctl -a/-an` works without root. Previously the directory chown stopped at `/run/service` and the FIFO inside stayed root-owned, so `register_profile_gateway` from hermes failed at the rescan-trigger step with EACCES — the wrapper in profiles.py caught the exception and printed a swallowed warning, so profile creation appeared to succeed while the slot was rolled back. Audit changes to flush this class of bug next time: - Add `docker_exec` / `docker_exec_sh` helpers to `tests/docker/conftest.py` that default to `-u hermes`. The module docstring explains why and flags `user="root"` as opt-in only for tests that explicitly need root (none currently do). - Refactor every `docker exec` call in tests/docker/ through the new helpers (test_dashboard.py, test_zombie_reaping.py, test_profile_gateway.py, test_container_restart.py, test_s6_profile_gateway_integration.py). - Add 5 unit tests covering `_s6_running` under various probe states (both signals present; comm wrong; basedir missing; PermissionError on /proc/1/comm; missing /proc — non-Linux). The PermissionError test is the explicit regression guard for the original bug. Known follow-up: the per-service `supervise/control` FIFO inside each `/run/service/gateway-<profile>/supervise/` is created root-owned by s6-supervise (which runs as root because s6-svscan is PID 1). `s6-svc -u/-d/-t` from the hermes user will get EACCES on those. The audit under `-u hermes` will reveal this in lifecycle tests — surfacing the issue cleanly so it can be fixed in a focused follow-up (likely via a small SUID helper or a polling chown loop in cont-init.d). The detection + svscanctl fixes here are independent and complete on their own.
This commit is contained in:
parent
a6f7171a5e
commit
2f8ceeab9a
9 changed files with 241 additions and 53 deletions
|
|
@ -122,16 +122,34 @@ def detect_service_manager() -> ServiceManagerKind:
|
|||
def _s6_running() -> bool:
|
||||
"""True when s6-svscan is running as PID 1 in this container.
|
||||
|
||||
s6-overlay's /init exec's s6-svscan, so ``/proc/1/exe`` resolves
|
||||
to it (or to ``init`` on some kernel configurations that hide the
|
||||
exe link). The ``/run/s6/`` directory is created by stage1, so its
|
||||
presence is a second necessary signal.
|
||||
Detection has to work for **both** root and the unprivileged hermes
|
||||
user (UID 10000). The obvious probe — ``Path('/proc/1/exe').resolve()``
|
||||
— only works as root: for any other UID, the symlink at
|
||||
``/proc/1/exe`` is unreadable and ``resolve()`` silently returns the
|
||||
path unchanged, so the resolved name is the literal ``"exe"`` and
|
||||
detection always fails. Since every Hermes runtime call inside the
|
||||
container drops to hermes via ``s6-setuidgid``, that silent failure
|
||||
made the entire service-manager runtime-registration path inert in
|
||||
production (PR #30136 review).
|
||||
|
||||
Probe instead via:
|
||||
* ``/proc/1/comm`` — world-readable, contains the process comm
|
||||
(``s6-svscan`` when s6-overlay is PID 1).
|
||||
* ``/run/s6/basedir`` — s6-overlay-specific directory created by
|
||||
stage1. World-readable. More specific than ``/run/s6`` (which
|
||||
other tools occasionally create).
|
||||
|
||||
Both signals are required; either alone could false-positive
|
||||
(e.g. a container with the s6 binaries installed but a different
|
||||
init, or an unrelated process named ``s6-svscan``).
|
||||
"""
|
||||
try:
|
||||
exe = Path("/proc/1/exe").resolve()
|
||||
return exe.name in ("s6-svscan", "init") and Path("/run/s6").exists()
|
||||
except (OSError, RuntimeError):
|
||||
comm = Path("/proc/1/comm").read_text().strip()
|
||||
except OSError:
|
||||
return False
|
||||
if comm != "s6-svscan":
|
||||
return False
|
||||
return Path("/run/s6/basedir").is_dir()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue