s6-overlay's /init scrubs the environment before invoking both
/etc/cont-init.d/* scripts and the container's CMD wrapper. As a
result, ENV directives from the Dockerfile (HERMES_HOME=/opt/data,
HERMES_WEB_DIST, …) and compose-time `environment:` entries
(HERMES_UID, HERMES_GID) never reached the scripts that actually
use them. Three concrete failures observed on macOS Docker Desktop
with `~/.hermes:/opt/data`:
* stage2-hook.sh ran with HERMES_UID unset → no UID remap, hermes
user stayed at UID 10000 instead of the host user's UID.
* skills_sync.py (invoked from stage2-hook) ran with HERMES_HOME
unset → get_hermes_home() fell back to Path.home()/.hermes,
populating a shadow $HERMES_HOME/.hermes/skills tree on the
mounted volume (visible on the host as ~/.hermes/.hermes/skills).
* The main `hermes gateway run` process inherited HOME=/root from
the /init context (s6-setuidgid doesn't update HOME), so
libraries resolving XDG_STATE_HOME via $HOME tried to write to
/root/.local/state/hermes/gateway-locks/ and failed with EACCES,
preventing the Discord adapter from acquiring its bot-token lock.
Three surgical changes restore correct env flow:
1. The auto-generated /etc/cont-init.d/01-hermes-setup wrapper now
uses `#!/command/with-contenv sh`, matching the pattern already
used by docker/cont-init.d/02-reconcile-profiles. The container
env (Dockerfile ENV + compose `environment:`) now reaches
stage2-hook.sh and the skills_sync.py subprocess it spawns.
2. docker/main-wrapper.sh also switches to `#!/command/with-contenv
sh`. The container CMD (`gateway run`, `chat`, `setup`, …) now
sees HERMES_HOME and the other container-level env vars.
3. docker/main-wrapper.sh exports HOME=/opt/data before
`s6-setuidgid hermes`. with-contenv populates HOME from the
/init context (/root); s6-setuidgid drops privileges but does
not update HOME. The hermes user's home per /etc/passwd is
/opt/data, so the explicit override matches passwd.
No behavior change for the non-buggy paths: the s6-supervised
services already used with-contenv, and HOME=/opt/data only affects
processes that resolved $HOME-based paths to /root (silently
broken).
The s6-overlay migration replaced every runtime use of gosu with
s6-setuidgid (in stage2-hook.sh, main-wrapper.sh, per-service run
scripts, and cont-init.d hooks), but the gosu binary itself was still
being copied into the image from tianon/gosu, and several comments
across the repo still pointed to it.
Image changes:
- Drop the FROM tianon/gosu:1.19-trixie AS gosu_source stage
- Drop the COPY --from=gosu_source /gosu /usr/local/bin/ layer
- Net: one fewer base-image pull, ~12-15 MB layer eliminated
Documentation/comment refresh (no behavior change):
- Dockerfile: update root-user rationale comment + cont-init.d comment
- docker/main-wrapper.sh: drop "pre-s6 contract (gosu drop)" reference
- docker-compose.yml: update UID/GID remap comment
- .hadolint.yaml: update DL3002 ignore rationale
- website/docs/user-guide/docker.md: privilege-drop helper is s6-setuidgid now
- hermes_cli/config.py: docker_run_as_host_user docstring
tools/environments/docker.py runs *arbitrary user images* via the
terminal backend, not the bundled Hermes image. It still needs SETUID/
SETGID caps so user images that use gosu/su/s6-setuidgid all work.
Renamed the cap-list constant _GOSU_CAP_ARGS → _PRIVDROP_CAP_ARGS and
updated comments to list s6-setuidgid alongside the others as examples.
The matching test (test_security_args_include_setuid_setgid_for_gosu_drop
→ test_security_args_include_setuid_setgid_for_privdrop) was renamed
and its docstring updated; behavior is unchanged.
Verification:
- hadolint clean against .hadolint.yaml
- shellcheck clean against all docker/ shell scripts
- Image rebuilt successfully (sha 1a090924ccea)
- Docker harness: 19 passed in 41.87s (every Phase 0 test + Phase 4
per-profile-gateway lifecycle + container-restart reconciliation)
- tests/tools/test_docker_environment.py: 23 passed (rename did not
break test discovery; pre-existing unrelated mock warning)
The plan document (docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md)
intentionally retains its historical references to gosu — it describes
the pre-s6 entrypoint as background for understanding the migration.
BREAKING CHANGE: the container ENTRYPOINT is now /init (s6-overlay)
instead of /usr/bin/tini. Main hermes runs as the container CMD with
TTY inherited (preserving --tui), dashboard runs as a supervised s6-rc
service (HERMES_DASHBOARD=1 starts it; crashes auto-restart), and the
ground is laid for per-profile gateway supervision (Phase 3+4).
All five pre-s6 docker run invocation patterns continue to work
identically — verified by the Phase 0 docker harness:
docker run <image> → `hermes` with no args
docker run <image> chat -q "..." → `hermes chat -q ...` passthrough
docker run <image> sleep infinity → `sleep infinity` direct
docker run <image> bash → interactive bash
docker run -it <image> --tui → interactive Ink TUI
Phase 2 harness result: 12 passed, 2 xfailed (Phase 4 target). Hadolint
+ shellcheck pass cleanly.
Architecture pivot from plan v3 (documented in main-hermes/run header):
the plan called for main hermes to be an s6-supervised service, but
two real s6-overlay v3 mechanics blocked that — cont-init.d scripts
receive no arguments (CMD args are not visible to stage2-hook), and
`/run/s6/basedir/bin/halt` after writing the exit code did not
propagate the desired exit code (container exits 143). We use the
s6-overlay-native CMD pattern instead: main-wrapper.sh is the
container's main program (ENTRYPOINT prepends it so leading-dash
args like --version aren't intercepted by /init), exec's the final
program with stdin/stdout/stderr inherited, and the program's exit
code becomes the container exit code. main-hermes is now a no-op
`sleep infinity` slot kept for future supervised-gateway-container
modes. This trades "supervised restart of main hermes" for arg-
parity with the pre-s6 contract — main hermes was already unsupervised
under tini, so we lose nothing functional. Dashboard supervision is
the only new guarantee added by this phase.
Files added:
docker/main-wrapper.sh # arg routing + s6-setuidgid drop
docker/stage2-hook.sh # gosu-equivalent + chown + seed
docker/s6-rc.d/main-hermes/{type,run,dependencies.d/base}
docker/s6-rc.d/dashboard/{type,run,dependencies.d/base}
docker/s6-rc.d/user/contents.d/{main-hermes,dashboard}
Files changed:
Dockerfile: tini → s6-overlay install + ENTRYPOINT flip + service wiring
docker/entrypoint.sh: thin shim to stage2-hook.sh for back-compat
tests/docker/test_dashboard.py: add test_dashboard_restarts_after_crash
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md