fix(docker): make s6 lifecycle work for the unprivileged hermes user

Resolves the explicit "Known follow-up" left by commit 2f8ceeab9 and
the resulting CI failures in tests/docker/test_dashboard.py and
tests/docker/test_s6_profile_gateway_integration.py.

The product gap
---------------
Every hermes runtime operation inside the container runs as the
hermes user (UID 10000) via s6-setuidgid. But s6-supervise — spawned
by s6-svscan running as PID 1 — creates each service's supervise/
and top-level event/ directories with mode 0700 owned by its
effective UID (root). That left every s6-svc / s6-svstat / s6-svwait
call from hermes hitting EACCES on the supervise/control FIFO and
supervise/status — i.e. the entire S6ServiceManager lifecycle
(register, start, stop, unregister) was inert in production.

The 2f8ceeab9 commit message called this out and deferred the fix.
The audit changes that landed alongside it (defaulting docker_exec
to -u hermes) made the integration tests reproduce the bug
deterministically; the fix below resolves it.

The fix: pre-create the supervise/ skeleton hermes-owned
----------------------------------------------------------
Reading s6's source (src/supervision/s6-supervise.c::trymkdir +
control_init), the mkdir and mkfifo calls that build the supervise
tree are EEXIST-safe: if the directory or FIFO is already present,
s6-supervise reuses it and skips the chown/chmod fix-up that would
normally make event/ 03730 root:root. So if we lay the skeleton
down with hermes ownership before triggering s6-svscanctl -a,
s6-supervise inherits our layout and never touches it. The
death_tally / lock / status regular files written later by
s6-supervise (still as root) land mode 0644 — world-readable —
which is all s6-svstat needs.

New module-level helper _seed_supervise_skeleton(svc_dir) in
hermes_cli/service_manager.py lays down:
  svc_dir/event/                       hermes:hermes 03730
  svc_dir/supervise/                   hermes:hermes 0755
  svc_dir/supervise/event/             hermes:hermes 03730
  svc_dir/supervise/control            hermes:hermes 0660 (FIFO)
  svc_dir/log/event/                   hermes:hermes 03730  (if log/ present)
  svc_dir/log/supervise/               hermes:hermes 0755
  svc_dir/log/supervise/event/         hermes:hermes 03730
  svc_dir/log/supervise/control        hermes:hermes 0660 (FIFO)

The log/ branch matters because the logger is a second
s6-supervise instance — without it, unregister rmtree races on
the logger's root-owned supervise dir even after the parent
slot's supervise/ is hermes-owned. The helper is idempotent and
swallows PermissionError on chown so it works equally well when
called from root (cont-init.d) or hermes (runtime register).

Wiring
------
1. S6ServiceManager.register_profile_gateway calls
   _seed_supervise_skeleton(tmp_dir) just before publishing the
   slot via Path.replace. Runtime-registered profile gateways are
   set up by hermes.

2. container_boot._register_service does the same in the cont-init.d
   reconciliation path so boot-time-restored profile slots inherit
   the same layout.

3. New cont-init.d/015-supervise-perms script chowns the supervise/
   and event/ trees for STATIC s6-rc services (dashboard,
   main-hermes). These are spawned by s6-rc before cont-init.d
   gets to run, so the EEXIST-trick doesn't apply; we chown the
   already-existing tree instead. s6-supervise keeps using the
   same files; it never re-asserts ownership on a running service.
   The script skips s6-overlay internal services (s6rc-*,
   s6-linux-*) so the supervision tree itself stays root-only.
   015- slot is intentional: lex-sorts between 01-hermes-setup
   and 02-reconcile-profiles in the container's C-locale, so
   the chown finishes before the reconciler walks the scandir.

Unregister teardown reordering
------------------------------
S6ServiceManager.unregister_profile_gateway now fires
s6-svscanctl -an BEFORE rmtree (with a 200ms grace), so
s6-svscan reaps the supervise child and releases its file
handles on supervise/lock + supervise/status before we try to
remove the directory. Previously rmtree raced s6-supervise on a
set of files inside the supervise dir, and even with the parent
supervise/ now hermes-owned, the contained files (death_tally,
lock, status, written by root) could still be in use.

Dashboard down-state redesign
-----------------------------
The original PR #30136 review fix wrote a 'down' marker file
into /run/service/dashboard/ via cont-init.d/03-dashboard-toggle.
That approach was broken in two ways:

  (a) /run/service/dashboard is a symlink to a TRANSIENT
      /run/s6-rc:s6-rc-init:<tmpdir>/ directory while s6-rc is
      mid-transaction; the touch landed in a soon-to-be-discarded
      tmp.

  (b) Even when written to the final /run/s6-rc/servicedirs/
      location, the 'down' file is only consulted by s6-supervise
      at slot startup. s6-rc's user-bundle explicitly transitions
      'dashboard' to 'up' on every boot, overriding any down
      marker.

The right fix is the canonical s6 pattern: when HERMES_DASHBOARD
is unset, the dashboard run script exits 0 and a companion
finish script exits 125. Per s6-supervise(8), exit code 125 from
the finish script is the 'permanent failure, do not restart'
marker — equivalent to s6-svc -O. The slot reports as 'down' to
s6-svstat, matching the reality that no dashboard process is
running. When HERMES_DASHBOARD IS truthy, finish exits 0 and
restart-on-crash semantics apply.

03-dashboard-toggle is removed (its function is now subsumed by
the run/finish pair).

Tests
-----
Adds four unit tests for _seed_supervise_skeleton covering the
produced layout, the log/ subservice case, the skip-when-no-log
case, and idempotency. The live-container verification continues
to live in tests/docker/test_s6_profile_gateway_integration.py and
tests/docker/test_dashboard.py — both now pass against the
rebuilt image.

References
----------
* Skarnet skaware mailing list 2020-02-02 (Laurent Bercot
  + Guillermo Diaz Hartusch) on unprivileged s6 tool semantics:
  http://skarnet.org/lists/skaware/1424.html
* just-containers/s6-overlay#130 — same EEXIST-preseed pattern,
  community-validated 2016 onward
* https://skarnet.org/software/s6/servicedir.html — exit-code 125
  semantics in finish scripts

(cherry picked from commit c41f908ad4)
This commit is contained in:
Ben 2026-05-25 11:21:31 +10:00
parent 7f6f00f6ec
commit 4f416fc40c
8 changed files with 433 additions and 64 deletions

View file

@ -0,0 +1,90 @@
#!/command/with-contenv sh
# shellcheck shell=sh
# Make supervise/ trees for ALL declared s6 services queryable and
# controllable by the unprivileged hermes user (UID 10000).
#
# Background (PR #30136 review item I4): the entire s6 lifecycle
# (s6-svc, s6-svstat, s6-svwait) is dispatched as the hermes user
# inside the container (every Hermes runtime path runs under
# ``s6-setuidgid hermes``). But s6-supervise creates each service's
# ``supervise/`` and top-level ``event/`` directory with mode 0700
# owned by its effective UID — which is root, because s6-supervise
# is spawned by s6-svscan running as PID 1. So unprivileged clients
# get EACCES on every probe / control call against the slot.
#
# Two fixes, one in each registration path:
#
# 1. For RUNTIME-registered profile gateways (created via the s6
# runtime register hooks in profiles.py): the Python helper
# ``_seed_supervise_skeleton`` pre-creates supervise/ + event/ +
# supervise/control owned by hermes BEFORE s6-svscanctl -a fires.
# s6-supervise's mkdir/mkfifo are EEXIST-safe, so it inherits our
# ownership and never tries to chown back to root.
#
# 2. For STATIC s6-rc services (dashboard, main-hermes) declared at
# image-build time under /etc/s6-overlay/s6-rc.d/*: these are
# compiled by s6-rc at boot, and s6-supervise spawns BEFORE
# cont-init.d gets to run — so by the time we're here, the
# supervise/ tree is already there as root:root 0700. We chown
# it here. s6-supervise will keep using the same files; it never
# re-asserts ownership on a running service.
#
# This script runs as root after 01-hermes-setup but before
# 02-reconcile-profiles, so the chowns are settled before the
# Python reconciler walks the scandir. Lexicographic ordering
# guarantees this — the suffix is unusual because we want to slot
# in between 01 and the existing 02-reconcile-profiles without
# renumbering both (which would be a churn-noise patch on its own).
set -eu
# /run/s6-rc/servicedirs holds the live, compiled service directories
# for every static (s6-rc) service. Symlinks under /run/service/*
# point here. Per-service supervise/ + event/ both need hermes
# ownership for s6-svstat etc. to work as hermes.
SVC_ROOT=/run/s6-rc/servicedirs
if [ ! -d "$SVC_ROOT" ]; then
echo "[supervise-perms] $SVC_ROOT not present; skipping"
exit 0
fi
for svc in "$SVC_ROOT"/*; do
[ -d "$svc" ] || continue
name=$(basename "$svc")
# Skip s6-overlay-internal services (they need to stay root-only;
# the s6rc-* helpers manage the supervision tree itself).
case "$name" in
s6rc-*|s6-linux-*)
continue
;;
esac
# supervise/ tree — needed by s6-svstat / s6-svc.
if [ -d "$svc/supervise" ]; then
chown -R hermes:hermes "$svc/supervise" 2>/dev/null || \
echo "[supervise-perms] could not chown $svc/supervise"
# 0710 = group searchable. ``s6-svstat`` only needs to openat
# status, not list the dir, but giving the hermes group +x is
# the minimum that lets group members access the contents.
chmod 0710 "$svc/supervise" 2>/dev/null || true
# supervise/control is a FIFO that s6-svc writes commands
# into; the hermes user needs +w. Owner is already hermes
# after the recursive chown above; widen perms to 0660 so
# ``s6-svc`` works for any member of the hermes group too.
if [ -p "$svc/supervise/control" ]; then
chmod 0660 "$svc/supervise/control" 2>/dev/null || true
fi
fi
# Top-level event/ dir — s6-svlisten1 / s6-svwait subscribe here.
if [ -d "$svc/event" ]; then
chown hermes:hermes "$svc/event" 2>/dev/null || \
echo "[supervise-perms] could not chown $svc/event"
# Preserve s6's 03730 mode (setgid + g+rwx + sticky).
chmod 03730 "$svc/event" 2>/dev/null || true
fi
done
echo "[supervise-perms] chowned supervise/ trees for static s6-rc services"

View file

@ -1,55 +0,0 @@
#!/command/with-contenv sh
# shellcheck shell=sh
# Toggle the dashboard s6-rc service slot based on HERMES_DASHBOARD.
#
# Runs as root in cont-init.d, after 01-hermes-setup (stage2) and
# 02-reconcile-profiles, BEFORE s6-rc starts user services.
#
# Background (PR #30136 review item I3): the dashboard service was
# always declared as an s6-rc longrun, with its run script checking
# HERMES_DASHBOARD and `exec sleep infinity` when unset. Trouble:
# s6-svstat then reports the dashboard slot as "up" (because sleep
# IS running) even though no dashboard process exists. `hermes
# doctor` and any other s6-svstat-based health check sees a
# false-positive up-state.
#
# Fix: write a `down` marker file into the live service-dir when
# HERMES_DASHBOARD is unset / falsy. s6-supervise honors `down` by
# not starting the service at all, so s6-svstat reports `down` —
# matching reality.
#
# The run script's HERMES_DASHBOARD case-statement stays in place
# as a belt-and-suspenders guard: even if the down marker is
# removed at runtime and the service is brought up, the run script
# still bails when HERMES_DASHBOARD is unset. Both layers agree.
set -eu
# Live service directory for the dashboard longrun. s6-overlay
# compiles /etc/s6-overlay/s6-rc.d/dashboard/ into this location
# at boot, before cont-init.d scripts run.
DASHBOARD_LIVE_DIR="/run/service/dashboard"
# If the live directory hasn't materialized yet (e.g. running in a
# stripped-down test image), nothing to do — the run script's env
# check still keeps things safe.
if [ ! -d "$DASHBOARD_LIVE_DIR" ]; then
echo "[dashboard-toggle] $DASHBOARD_LIVE_DIR not present; skipping"
exit 0
fi
case "${HERMES_DASHBOARD:-}" in
1|true|TRUE|True|yes|YES|Yes)
# Enabled — remove any leftover down marker from a previous boot.
if [ -e "$DASHBOARD_LIVE_DIR/down" ]; then
rm -f "$DASHBOARD_LIVE_DIR/down"
echo "[dashboard-toggle] HERMES_DASHBOARD enabled; removed down marker"
fi
;;
*)
# Disabled — write a down marker so s6-supervise won't start
# the service. s6-svstat will report it as down, matching reality.
touch "$DASHBOARD_LIVE_DIR/down"
echo "[dashboard-toggle] HERMES_DASHBOARD unset; marked dashboard slot down"
;;
esac

30
docker/s6-rc.d/dashboard/finish Executable file
View file

@ -0,0 +1,30 @@
#!/command/with-contenv sh
# shellcheck shell=sh
# Dashboard finish script. Companion to ./run.
#
# When HERMES_DASHBOARD is unset (or falsy), ./run exits 0 immediately.
# Without this finish script, s6-supervise would just restart the run
# script in a tight loop. By exiting 125 here, we tell s6-supervise
# "this service has permanently failed; do not restart" — equivalent
# to `s6-svc -O`. The supervise slot reports as down, matching reality
# (no dashboard process is running).
#
# When HERMES_DASHBOARD IS enabled and the run script later exits or
# is killed, we want s6-supervise to restart it (the whole point of
# supervised lifecycle). So we exit non-125 in that case.
# Arguments passed to a finish script: $1=run-exit-code, $2=signal-num,
# $3=service-dir-name, $4=run-pgid. See servicedir(7).
case "${HERMES_DASHBOARD:-}" in
1|true|TRUE|True|yes|YES|Yes)
# Dashboard was enabled — let s6-supervise restart on crash by
# exiting non-125. (Pass-through any sensible default.)
exit 0
;;
*)
# Dashboard disabled — permanent-failure marker so s6-supervise
# leaves the slot in 'down' state and s6-svstat reflects that.
exit 125
;;
esac

View file

@ -1,12 +1,22 @@
#!/command/with-contenv sh
# shellcheck shell=sh
# Dashboard service. Always declared so s6 has a supervised slot; if
# HERMES_DASHBOARD isn't set to a truthy value we sleep forever and do
# nothing. See OQ3-A in the plan.
# HERMES_DASHBOARD isn't truthy the run script exits cleanly and the
# companion finish script returns 125 (s6's "permanent failure, do
# not restart" marker), so s6-svstat reports the slot as down. See
# also docker/s6-rc.d/dashboard/finish.
case "${HERMES_DASHBOARD:-}" in
1|true|TRUE|True|yes|YES|Yes) ;;
*) exec sleep infinity ;;
*)
# Exit 0; the finish script will exit 125 → s6-supervise won't
# restart us and the slot reports down. Using a clean exit
# (rather than `exec sleep infinity`) means s6-svstat reflects
# reality: when HERMES_DASHBOARD is unset, the service is NOT
# running, just supervised-with-permanent-failure. See PR
# #30136 review item I3.
exit 0
;;
esac
cd /opt/data