fix(docker): propagate container env through s6 to cont-init and main CMD

s6-overlay's /init scrubs the environment before invoking both
/etc/cont-init.d/* scripts and the container's CMD wrapper. As a
result, ENV directives from the Dockerfile (HERMES_HOME=/opt/data,
HERMES_WEB_DIST, …) and compose-time `environment:` entries
(HERMES_UID, HERMES_GID) never reached the scripts that actually
use them. Three concrete failures observed on macOS Docker Desktop
with `~/.hermes:/opt/data`:

* stage2-hook.sh ran with HERMES_UID unset → no UID remap, hermes
  user stayed at UID 10000 instead of the host user's UID.
* skills_sync.py (invoked from stage2-hook) ran with HERMES_HOME
  unset → get_hermes_home() fell back to Path.home()/.hermes,
  populating a shadow $HERMES_HOME/.hermes/skills tree on the
  mounted volume (visible on the host as ~/.hermes/.hermes/skills).
* The main `hermes gateway run` process inherited HOME=/root from
  the /init context (s6-setuidgid doesn't update HOME), so
  libraries resolving XDG_STATE_HOME via $HOME tried to write to
  /root/.local/state/hermes/gateway-locks/ and failed with EACCES,
  preventing the Discord adapter from acquiring its bot-token lock.

Three surgical changes restore correct env flow:

1. The auto-generated /etc/cont-init.d/01-hermes-setup wrapper now
   uses `#!/command/with-contenv sh`, matching the pattern already
   used by docker/cont-init.d/02-reconcile-profiles. The container
   env (Dockerfile ENV + compose `environment:`) now reaches
   stage2-hook.sh and the skills_sync.py subprocess it spawns.

2. docker/main-wrapper.sh also switches to `#!/command/with-contenv
   sh`. The container CMD (`gateway run`, `chat`, `setup`, …) now
   sees HERMES_HOME and the other container-level env vars.

3. docker/main-wrapper.sh exports HOME=/opt/data before
   `s6-setuidgid hermes`. with-contenv populates HOME from the
   /init context (/root); s6-setuidgid drops privileges but does
   not update HOME. The hermes user's home per /etc/passwd is
   /opt/data, so the explicit override matches passwd.

No behavior change for the non-buggy paths: the s6-supervised
services already used with-contenv, and HOME=/opt/data only affects
processes that resolved $HOME-based paths to /root (silently
broken).
This commit is contained in:
John Paul Soliva 2026-05-26 13:41:21 +09:00
parent cea87d9139
commit 29c71e972a
2 changed files with 14 additions and 2 deletions

View file

@ -1,9 +1,15 @@
#!/bin/sh
#!/command/with-contenv sh
# /opt/hermes/docker/main-wrapper.sh — wraps the container's CMD with
# the same argument-routing logic the pre-s6 entrypoint.sh used. Runs
# as /init's "main program" (Docker CMD) so it inherits stdin/stdout/
# stderr from the container.
#
# Shebang note: /init scrubs env before invoking CMD, so a plain
# `#!/bin/sh` wrapper sees an empty environ and `ENV HERMES_HOME=/opt/data`
# from the Dockerfile never reaches `hermes`. with-contenv repopulates
# the env from /run/s6/container_environment before exec'ing, which is
# what s6-supervised services use too (see main-hermes/run).
#
# Routing:
# no args → exec `hermes` (the default)
# first arg is an executable → exec it directly (sleep, bash, sh, …)
@ -13,6 +19,12 @@
# workload runs unprivileged (UID 10000 by default).
set -e
# HOME comes through with-contenv as /root (the /init context). Override
# to the hermes user's home before dropping privileges so libraries that
# resolve paths via $HOME (e.g. discord lockfile under XDG_STATE_HOME)
# don't try to write to /root.
export HOME=/opt/data
cd /opt/data
# shellcheck disable=SC1091
. /opt/hermes/.venv/bin/activate