mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
PR #30136 review surfaced two issues, both rooted in the same audit gap: docker integration tests were running as root, not the unprivileged `hermes` user (UID 10000) that the runtime actually uses via `s6-setuidgid hermes`. Anything that probed PID-1 state or wrote to the s6 control surface worked as root in the tests but was inert in production. Fixes: 1. `_s6_running()` previously called `Path("/proc/1/exe").resolve()`, which is root-only readable. For UID 10000 the symlink yields PermissionError, `resolve()` silently returns the unresolved path, and `exe.name == "exe"` — so detection always returned False, the service-manager runtime-registration path was inert, and every `hermes profile create` / `hermes -p X gateway start` silently skipped the s6 hook. Replace with `/proc/1/comm` (world-readable) + `/run/s6/basedir` (s6-overlay-specific) — both required, fail closed. 2. `02-reconcile-profiles` now also chowns `/run/service/.s6-svscan/` {control,lock} to hermes so `s6-svscanctl -a/-an` works without root. Previously the directory chown stopped at `/run/service` and the FIFO inside stayed root-owned, so `register_profile_gateway` from hermes failed at the rescan-trigger step with EACCES — the wrapper in profiles.py caught the exception and printed a swallowed warning, so profile creation appeared to succeed while the slot was rolled back. Audit changes to flush this class of bug next time: - Add `docker_exec` / `docker_exec_sh` helpers to `tests/docker/conftest.py` that default to `-u hermes`. The module docstring explains why and flags `user="root"` as opt-in only for tests that explicitly need root (none currently do). - Refactor every `docker exec` call in tests/docker/ through the new helpers (test_dashboard.py, test_zombie_reaping.py, test_profile_gateway.py, test_container_restart.py, test_s6_profile_gateway_integration.py). - Add 5 unit tests covering `_s6_running` under various probe states (both signals present; comm wrong; basedir missing; PermissionError on /proc/1/comm; missing /proc — non-Linux). The PermissionError test is the explicit regression guard for the original bug. Known follow-up: the per-service `supervise/control` FIFO inside each `/run/service/gateway-<profile>/supervise/` is created root-owned by s6-supervise (which runs as root because s6-svscan is PID 1). `s6-svc -u/-d/-t` from the hermes user will get EACCES on those. The audit under `-u hermes` will reveal this in lifecycle tests — surfacing the issue cleanly so it can be fixed in a focused follow-up (likely via a small SUID helper or a polling chown loop in cont-init.d). The detection + svscanctl fixes here are independent and complete on their own.
46 lines
2.1 KiB
Text
Executable file
46 lines
2.1 KiB
Text
Executable file
#!/command/with-contenv sh
|
|
# shellcheck shell=sh
|
|
# Container-boot reconciliation of per-profile gateway s6 services.
|
|
#
|
|
# Runs as root after 01-hermes-setup (the stage2 hook) has chowned
|
|
# the volume and seeded $HERMES_HOME, but before s6-rc starts user
|
|
# services. /etc/cont-init.d/* scripts run in lexicographic order,
|
|
# so the `02-` prefix guarantees ordering.
|
|
#
|
|
# Service directories under /run/service/ live on tmpfs and are
|
|
# wiped on every container restart. Profile directories under
|
|
# $HERMES_HOME/profiles/ live on the persistent VOLUME. This script
|
|
# walks the persistent profiles, recreates the s6 service slots,
|
|
# and auto-starts only those whose last recorded state was
|
|
# `running` — see hermes_cli/container_boot.py.
|
|
#
|
|
# Phase 4 also needs hermes-user writes to /run/service/ (so the
|
|
# profile create/delete hooks can register/unregister at runtime),
|
|
# so we chown the scandir before invoking the reconciler. We
|
|
# additionally chown the s6-svscan control FIFO so the hermes user
|
|
# can send rescan signals via ``s6-svscanctl -a``; without this the
|
|
# entire runtime-registration path is inert under UID 10000 (the
|
|
# Python wrapper catches the resulting EACCES, prints a warning,
|
|
# and swallows the failure).
|
|
set -e
|
|
|
|
# Make the dynamic scandir hermes-writable. The directory itself
|
|
# starts root-owned by s6-overlay.
|
|
chown hermes:hermes /run/service 2>/dev/null || true
|
|
|
|
# Make the svscan control FIFO hermes-writable so s6-svscanctl -a
|
|
# / -an work for the hermes user. The FIFO is created by s6-svscan
|
|
# at PID-1 startup, so by the time this cont-init.d script runs it
|
|
# already exists. Both ``control`` and ``lock`` need to be writable
|
|
# for the various svscanctl operations; the directory itself stays
|
|
# root-owned (we only need to touch the two FIFOs/locks inside).
|
|
if [ -d /run/service/.s6-svscan ]; then
|
|
for entry in control lock; do
|
|
if [ -e "/run/service/.s6-svscan/$entry" ]; then
|
|
chown hermes:hermes "/run/service/.s6-svscan/$entry" 2>/dev/null || true
|
|
fi
|
|
done
|
|
fi
|
|
|
|
exec s6-setuidgid hermes /opt/hermes/.venv/bin/python -m hermes_cli.container_boot
|
|
|