The install method (docker/git/pip/...) describes the *running binary*, but
detect_install_method() read it from $HERMES_HOME/.install_method — a shared
DATA directory. The Docker docs deliberately bind-mount $HERMES_HOME
(~/.hermes:/opt/data) so config/sessions/memory persist and can be shared with
a host-side Desktop/CLI install.
When a containerized gateway and a host install share one $HERMES_HOME, the
home-scoped stamp is a single slot describing two installs: the published image
stamps 'docker' on every boot, the host install then reads 'docker' and the
in-app updater refuses to run 'hermes update' ("doesn't apply inside the Docker
container"). Reinstalling the Desktop app from the DMG doesn't help because the
contaminated stamp is re-read every time.
Fix (option 1 — code-scoped stamp):
- detect_install_method() reads <install tree>/.install_method first (next to
the running code, immune to the shared data dir). It falls back to the legacy
$HERMES_HOME stamp for back-compat, but IGNORES a 'docker' home stamp when
not actually containerized — so already-poisoned shared homes self-heal.
- stamp_install_method() writes the code-scoped stamp.
- install.sh stamps $INSTALL_DIR instead of $HERMES_HOME.
- Dockerfile bakes 'docker' into /opt/hermes/.install_method at build time
(inside the immutable block); stage2-hook.sh no longer writes the home stamp
and proactively removes a stale 'docker' one to heal existing shared homes.
Genuine containers still resolve to 'docker' (baked stamp, or legacy home stamp
honored when containerized). Unstamped installs in generic containers still fall
through to git/pip (preserves the #34397 fix).
* fix(gateway): chown logs/gateways parent so late-added profiles can log
The per-profile log service script created $HERMES_HOME/logs/gateways/
via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When
the first log service boots in root context, the gateways/ parent stays
root:root; every profile registered later runs its log service as the
dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a
sub-second fatal crash-loop flooding the container log. The stage2
recursive heal does not catch it either: it is gated on needs_chown,
which is false when the top-level $HERMES_HOME is already hermes-owned.
Two complementary fixes:
- service_manager._render_log_run: chown the gateways/ parent
(non-recursively) before the leaf chown. Runs on every root-context
boot, so it also heals volumes already poisoned by older images.
- docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p
block; cont-init runs before any service starts, so the parent
already exists hermes-owned when the first log/run does 'mkdir -p'.
The needs_chown repair loop needs no twin entry: it already chowns
logs/ recursively, which covers logs/gateways.
Fixes#45258
* chore(release): map salvaged contributor
---------
Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>
Salvage of #37928 (@sarvesh1327), reduced to the still-needed delta.
`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (#27221).
Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.
Differs from #37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's #38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.
Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.
Fixes#27221.
Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
Salvage of #35508 (@dchenk), rebased onto current main. Resolved the
tests/tools/test_stage2_hook_puid_pgid.py conflict (kept both the
envdir-creation regression test on main and the new config-migration
tests).
Docker image upgrades replace code under $INSTALL_DIR but preserve
$HERMES_HOME on the mounted volume, so the persisted config.yaml never
received the schema migrations that non-Docker `hermes update` runs
(#35406). This adds scripts/docker_config_migrate.py, invoked from
stage2-hook after first-boot seeding and before gateway services start:
it backs up config.yaml + .env, runs migrate_config(interactive=False),
and honors HERMES_SKIP_CONFIG_MIGRATION=1 for manual control.
Also fixes a latent bug in check_config_version(): it called load_config()
which deep-merges DEFAULT_CONFIG, so a legacy config with no raw
_config_version falsely reported as already-current. It now reads the raw
on-disk file so legacy configs are correctly detected for migration.
Differs from #35508 as submitted (Option B cleanup): dropped the
`_config_version` line added to cli-config.yaml.example and removed the
accompanying test_cli_config_example_declares_latest_version change-detector
test. The example is a copy-template and has no business asserting a schema
version; check_config_version() reads the user's real config.yaml, not the
example. This removes a second sync point that drifts on every version bump.
Closes#35508. Fixes#35406.
Co-authored-by: Dmitriy Cherchenko <17372886+dchenk@users.noreply.github.com>
`docker run --user $(id -u):$(id -g)` was a tini-era trick to make
container-written files match the host user. Under s6-overlay it no longer
works: the bootstrap (UID remap, volume + build-tree chown, config seeding)
needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui,
node_modules) are owned by the hermes build UID (10000). A pinned arbitrary
UID can't write them, so the runtime fails with EACCES on a bind mount or
hard-crashes on a named volume (Docker inits the volume from the image as
10000; the non-root start can't even `cd /opt/data`, and the profile
reconciler dies with PermissionError on gateway_state.json).
Detect that start early in both the cont-init hook (stage2-hook.sh) and the
CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing
at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID
aliases), which remaps the hermes user and chowns the volume — the same
host-UID-matching outcome --user was used for, without breaking s6.
The guard fires only when the current UID is neither root NOR the hermes UID.
This preserves the supported non-root start from #34648/#34837 (running with
`--user 10000:10000`, i.e. pinned to the hermes UID itself), which is
unaffected — only the arbitrary-UID variant that #34837 never actually made
writable is rejected.
Verified live across five scenarios (built image, bind + named volume):
arbitrary --user on bind -> rejected with guidance, hermes does not run;
arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash;
--user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not
tripped; default root start -> boots. Pre-fix control reproduces the raw
PermissionError + 'can't cd' crash with no guidance.
The stage2 hook gates the recursive chown of the build trees under
$INSTALL_DIR (.venv, ui-tui, node_modules) so a HERMES_UID/PUID remap
leaves them writable by the new runtime UID — needed for lazy_deps
'uv pip install' of platform extras (#15012, #21100) and the TUI esbuild
rebuild into ui-tui/dist (#28851).
#35027 folded that chown under the $HERMES_HOME ownership check
('stat $HERMES_HOME != hermes_uid'). But 'usermod -u <new> hermes'
re-chowns the hermes home dir ($HERMES_HOME == /opt/data) to the new UID
as a side effect, so after any remap that stat is already satisfied and
needs_chown is false — silently skipping the build-tree chown on the
common PUID/NAS path. The venv stays owned by the build-time UID (10000),
so lazy installs and TUI rebuilds fail with EACCES.
Probe the build trees directly instead: chown only when /opt/hermes/.venv
is not already owned by the runtime hermes UID. Independent of
$HERMES_HOME ownership, idempotent across restarts.
Verified live: built the image, booted with HERMES_UID/HERMES_GID on a
fresh named volume, confirmed .venv/ui-tui/node_modules end up owned by
the remapped UID and 'uv pip install' into the venv succeeds; confirmed
the recursive chown fires once and is skipped on restart.
On a fresh volume there is no gateway_state.json, so the boot reconciler
(cont-init.d/02-reconcile-profiles) registers the gateway-default s6 slot
but leaves it down — it only auto-starts when the last recorded state was
"running". A freshly-provisioned container therefore comes up with the
gateway down until something starts it (e.g. the dashboard's start button).
Add a generic, first-boot-only env-seed in stage2-hook.sh (which runs
before 02-reconcile-profiles): when HERMES_GATEWAY_BOOTSTRAP_STATE=running
and no gateway_state.json exists yet, seed {"gateway_state":"running"} so
the reconciler brings the supervised slot up on the very first boot.
This mirrors the existing HERMES_AUTH_JSON_BOOTSTRAP pattern: it seeds the
same state file the reconciler already consults, guarded by [ ! -f ] so
persisted runtime state always wins on later boots (a deliberately-stopped
gateway stays stopped across restarts). Only the literal "running" is
honoured (the sole value in the reconciler's _AUTOSTART_STATES).
Generic container contract — no host-specific code. Useful to any
orchestrator that provisions a blank volume and wants the gateway up from
first boot (the supervised gateway/dashboard already work on such hosts;
only the first-boot autostart was missing because the CLI lifecycle
commands can't drive the s6 layer when container self-detection misses).
Adds a shell-level contract test and documents the env var.
The targeted data-volume chown in stage2-hook.sh only covers hermes-owned
*subdirectories*; loose state files living directly under $HERMES_HOME
(auth.json, state.db, gateway.lock, gateway_state.json, …) are missed.
When created or rewritten by `docker exec <container> hermes …` (root
unless `-u` is passed) they land root-owned, and the unprivileged hermes
runtime then hits PermissionError on next startup, producing a gateway
restart loop.
Fix: reset ownership of an explicit allowlist of hermes-owned top-level
files on every boot. The list mirrors the top-level file entries of
hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock
files.
This uses a targeted allowlist rather than the originally-proposed blanket
`find $HERMES_HOME -maxdepth 1 -user root` sweep, preserving the
targeted-ownership contract from #19788 / PR #19795: a bind-mounted
$HERMES_HOME may contain host-owned files Hermes does not manage, and
those must never be chowned. Verified end-to-end: allowlisted root-owned
files are reset to hermes on restart while a non-allowlisted host file
keeps its root ownership.
Co-authored-by: x1am1 <2663402852@qq.com>
When users bind-mount /var/run/docker.sock to use TERMINAL_ENV=docker from
inside the container, the supervised hermes user (UID 10000) lacks
permission to talk to the socket — every `docker` invocation EACCES'es and
check_terminal_requirements() returns False. In messaging mode this also
silently strips the file/terminal toolset from the registered tool list,
so the agent rationalizes the missing tools as a platform restriction.
The naive workaround (docker run --group-add <socket-gid>) does NOT work
with our s6-setuidgid privilege drop: s6-setuidgid calls initgroups() for
the target user, which rebuilds supp groups from /etc/group. Without a
matching /etc/group entry the kernel-granted supp group is wiped between
PID 1 and the dropped hermes process. Verified empirically:
--group-add 998 alone: PID 1 Groups: 0 998 → after drop: Groups: 10000
This fix's /etc/group add: id hermes shows 998 → after drop: Groups: 998 10000
Detect the socket's GID at boot in stage2-hook (runs as root before the
privilege drop), reuse an existing group name if one matches the GID,
otherwise create 'hostdocker'. Idempotent across container restarts.
Silent no-op when no socket is mounted.
End-to-end verified by building the image and running the supervised
hermes user against the real host Docker daemon: `docker version`
succeeds and check_terminal_requirements() returns True.
Fixes#16703
Salvages #25872 by @konsisumer against current main.
NAS users (UGOS, Synology, unRAID) expect the LinuxServer.io
PUID/PGID convention and bind-mount /opt/data from a host directory
owned by their own UID. Without this alias those vars are silently
ignored and the s6-setuidgid drop to UID 10000 leaves the runtime
unable to read the volume. HERMES_UID/HERMES_GID still take
precedence when both are set.
The original PR targeted docker/entrypoint.sh, which is now a 27-line
deprecation shim under s6-overlay (the May 2026 rework moved all
bootstrap logic to docker/stage2-hook.sh, installed as
/etc/cont-init.d/01-hermes-setup). Re-applied the same 2-line
alias resolution at the equivalent spot in stage2-hook.sh just
before the existing UID/GID remap block. Test was retargeted at
docker/stage2-hook.sh; docs hunk adapted to current main's wording
("stage2 hook" + s6-setuidgid, not the obsolete "entrypoint drops
via gosu") with the NAS bind-mount example preserved verbatim.
Test-first regression verification: reverted just docker/stage2-hook.sh
to origin/main and re-ran the new tests. Result:
FAILED test_stage2_hook_resolves_puid_pgid_aliases
FAILED test_puid_pgid_populate_hermes_uid_gid
AssertionError: assert ':' == '1000:10'
That's the exact bug shape — PUID=1000 PGID=10 silently ignored,
HERMES_UID/HERMES_GID stay empty. With the salvage applied, all 4
tests pass.
Closes#25872
Co-authored-by: konsisumer <11262660+konsisumer@users.noreply.github.com>
The image's Dockerfile runs npx playwright install chromium, which
populates $PLAYWRIGHT_BROWSERS_PATH (=/opt/hermes/.playwright) with a
`chromium_headless_shell-<build>/chrome-headless-shell-linux64/` tree.
agent-browser (the runtime CLI Hermes spawns for the browser tool)
doesn't recognise this layout in its own cache scan and fails with
`Auto-launch failed: Chrome not found` — even though the binary is
right there.
Reproduction on current main:
$ docker run --rm <image> sh -c 'npx -y agent-browser snapshot --url about:blank'
✗ Auto-launch failed: Chrome not found. Checked:
- agent-browser cache: /tmp/.../.agent-browser/browsers
- System Chrome installations
- Puppeteer browser cache
- Playwright browser cache
Run `agent-browser install` to download Chrome, or use --executable-path.
Fix: at boot, locate the binary under $PLAYWRIGHT_BROWSERS_PATH and
export AGENT_BROWSER_EXECUTABLE_PATH via /run/s6/container_environment
so the with-contenv shebang on main-wrapper.sh propagates it into the
supervised `hermes` process and thence to agent-browser subprocesses.
Filename-matched (chrome / chromium / chrome-headless-shell /
chromium-browser), not path-matched: the chromium dir contains many
shared libraries (libGLESv2.so, libEGL.so, ...) which inherit the
executable bit from Playwright's tarball but are NOT browser binaries.
Compare PR #18635's earlier `find | grep -Ei 'chrome|chromium'` which
would match the path .../chrome-headless-shell-linux64/libGLESv2.so
and pick a .so as the browser binary.
User overrides (e.g. `-e AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/...`)
are respected — the discovery block is skipped when the env var is
already set. Quietly skipped when $PLAYWRIGHT_BROWSERS_PATH doesn't
exist (e.g. custom builds that strip Playwright).
This salvages PR #18635 by @jackey8616, who identified the bug and
proposed the same env-var approach but in the now-deprecated
docker/entrypoint.sh shim and with a path-match find command that
selected .so files instead of the chrome binary. The fix retargets
docker/stage2-hook.sh (the s6-overlay cont-init script where boot-time
env setup belongs) with a corrected filename-match query.
Fixes#15697Closes#18635
Co-authored-by: Clooooode <12930377+jackey8616@users.noreply.github.com>
When HERMES_HOME points at a custom path whose parent directories
only root can create (e.g. HERMES_HOME=/home/hermes/.hermes in a
Compose file, or any path under a fresh / not pre-populated by the
image), stage2-hook.sh fails on first boot:
[stage2] Warning: chown failed (rootless container?) - continuing
mkdir: cannot create directory '/custom': Permission denied
mkdir: cannot create directory '/custom': Permission denied
... (one per s6-setuidgid hermes mkdir invocation)
cont-init: info: /etc/cont-init.d/01-hermes-setup exited 1
The mkdirs fail because s6-setuidgid drops to hermes (UID 10000)
before invoking mkdir -p, and the runtime user has no permission to
create root-owned ancestor directories. 02-reconcile-profiles then
crashes with FileNotFoundError, .install_method never lands, and
the container limps on in a half-initialized state.
Bootstrap HERMES_HOME with mkdir -p while still root, before the
ownership normalization. Idempotent on the default /opt/data path
(directory already exists from the Dockerfile RUN mkdir -p) and on
any subsequent restart. (#18482)
Retargeted from the original PR's docker/entrypoint.sh (now a
deprecated shim) to docker/stage2-hook.sh where the related chown
logic moved during the s6-overlay rework.
Co-authored-by: wpengpeng168 <133926080+wpengpeng168@users.noreply.github.com>
When HERMES_UID remaps the hermes user from 10000 to another UID
(e.g. matching the host user's UID for bind-mount ergonomics), the TUI
launcher's esbuild step fails:
✘ [ERROR] Failed to write to output file:
open /opt/hermes/ui-tui/dist/entry.js: permission denied
TUI build failed.
This is because the Dockerfile's build-time `chown -R hermes:hermes` on
`/opt/hermes/{.venv,ui-tui,node_modules}` (line 154) wrote UID 10000,
and stage2-hook.sh only re-chowned `.venv` on UID remap — leaving the
TUI build trees still owned by the old UID.
Extend the stage2 re-chown to include the same set as the build-time
chown: `.venv`, `ui-tui`, `node_modules`. These are the runtime-writable
trees under $INSTALL_DIR; everything else under /opt/hermes is read-only
at runtime so keeping it root-owned is fine.
Original fix targeted docker/entrypoint.sh which is now a deprecated shim;
retargeted to docker/stage2-hook.sh where the .venv chown moved during
the s6-overlay rework.
Co-authored-by: Andreas Steffan <623481+deas@users.noreply.github.com>
Replaces the recursive chown of $HERMES_HOME in stage2-hook.sh with a
targeted approach: chown the top-level dir (so hermes can create new subdirs)
plus the specific hermes-owned subdirectories (cron/, sessions/, logs/,
hooks/, memories/, skills/, skins/, plans/, workspace/, home/, profiles/) —
the same canonical list seeded by the s6-setuidgid mkdir -p block below.
Avoids clobbering host-side file ownership when $HERMES_HOME is a bind
mount that contains user-owned files not managed by hermes (issue #19788).
Original fix targeted docker/entrypoint.sh which is now a deprecated shim;
retargeted to docker/stage2-hook.sh where the recursive chown moved during
the s6-overlay rework.
Co-authored-by: Ptichalouf <1809721+ptichalouf@users.noreply.github.com>
.env holds API keys and secrets. Multiple creation sites used `cp` /
`touch` / `shutil.copy2` which obey the process umask — commonly
0o022, leaving the file at 0o644 (world-readable). Apply chmod 0o600
explicitly at every site that creates or copies .env.
Sites covered:
- docker/stage2-hook.sh: after the seed_one '.env' call, applied
unconditionally (not just on first-seed) so a host-mounted .env with
loose perms gets tightened on every container restart
- hermes_cli/doctor.py: 'hermes doctor --fix' touches an empty .env
when missing
- hermes_cli/profiles.py: 'hermes profile create --clone' copies .env
from the source profile; shutil.copy2 preserves source mode, so a
source .env at 0o644 was being cloned into 0o644
- setup-hermes.sh: in-tree setup script's cp .env.example .env path,
plus the already-exists branch (mirror of install.sh which already
chmods 600 unconditionally on line 1442)
scripts/install.sh was NOT changed — it already chmod 600's the .env
unconditionally after the create/already-exists branches (line 1442).
Salvaged from PR #25726 by @dusterbloom. The docker/entrypoint.sh
portion of the original PR was dropped because main switched to an
s6-overlay shim — the .env creation logic moved to stage2-hook.sh,
which is where the chmod now lives.
Closes#25497 (subset — install.sh + setup-hermes.sh) and #8448
(subset — install.sh only) as superseded.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
PR #30136 review caught: three `s6-setuidgid hermes sh -c "..."`
invocations in stage2-hook.sh interpolated $HERMES_HOME into a
nested shell context. Practically low-risk (a malicious HERMES_HOME
already requires container-launch privileges) but the cleaner
pattern is to invoke commands directly so the shell isn't a second
interpreter.
* `mkdir -p` of the data subdirs now runs directly via s6-setuidgid,
one path per arg.
* The .install_method stamp is written via `printf | tee` — also no
shell wrapper.
* The skills_sync invocation uses the venv's python by absolute path
instead of sourcing activate inside a shell. skills_sync.py doesn't
need anything from activate beyond sys.path, which the bin-stub
python already provides.
No behavior change. Just a smaller attack surface and a script
that's easier to read.
Phase 4 of the s6-overlay supervision plan. Activates the Phase 3
S6ServiceManager by hooking it into the profile lifecycle and the
`hermes gateway start/stop/restart` dispatcher, and adds a cont-
init.d-time reconciliation pass that survives `docker restart`.
Task 4.0 — container-boot reconciliation:
/run/service/ is tmpfs, so every `docker restart` wipes every
per-profile gateway slot. /etc/cont-init.d/02-reconcile-profiles
invokes hermes_cli.container_boot.reconcile_profile_gateways() on
every boot, which walks $HERMES_HOME/profiles/<name>/, reads each
gateway_state.json, recreates the s6 service slot, and auto-starts
only those whose last state was 'running'. Other states
(stopped, starting, startup_failed, missing) register the slot
in the down state — avoiding crash-loops across restarts for a
gateway that was broken last boot. Per-profile outcome is recorded
to $HERMES_HOME/logs/container-boot.log.
Implementation: hermes_cli/container_boot.py + 12 unit tests.
Profile-marker is SOUL.md, not config.yaml, because `hermes profile
create` only seeds SOUL.md by default (config.yaml comes from
`hermes setup`).
Task 4.1 / 4.2 — profile create/delete hooks:
hermes_cli/profiles.py::create_profile now calls
_maybe_register_gateway_service(<canon>) at the end, which routes
through ServiceManager.register_profile_gateway when running on s6
and no-ops on host backends. delete_profile mirrors with
_maybe_unregister_gateway_service. _allocate_gateway_port produces
a deterministic SHA-256-derived port in [9200, 9800).
Task 4.3 — gateway dispatch + remove rejection arms:
_dispatch_via_service_manager_if_s6(action) intercepts
start/stop/restart at the top of each subcommand and routes them
through S6ServiceManager.{start,stop,restart}. The pre-Phase-4
`elif is_container():` rejection arms are kept as fallback for
pre-s6 containers / unsupported runtimes, but only ever fire when
detect_service_manager() != 's6'. install/uninstall under s6
print informational guidance pointing users at profile create/delete.
Removed the two xfail(strict=True) markers from
tests/docker/test_profile_gateway.py — both tests now pass strictly.
Task 4.4 — status reporting:
get_gateway_runtime_snapshot() reports
Manager: 's6 (container supervisor)' inside an s6 container instead
of 'docker (foreground)'.
Plan-vs-reality drift fixed in this commit:
- Plan's S6ServiceManager._render_run_script used
`gateway start --foreground --port {port}` — invented args; the
real CLI is `gateway run`. Switched accordingly. port arg
retained for API parity but now documented as 'currently ignored'.
- Plan's reconciler keyed on config.yaml; switched to SOUL.md
(config.yaml is created by hermes setup, not by hermes profile
create, so the original gate caught nothing).
- The plan's _dispatch helper used _profile_arg() which returns
'--profile <name>' (i.e. with the flag prefix). Switched to
_profile_suffix() which returns the bare name.
- Architecture B's docker exec doesn't get /command on PATH or
the venv on PATH; Dockerfile's runtime PATH now includes
/opt/hermes/.venv/bin so 'docker exec <c> hermes ...' works
without sourcing the venv.
- stage2-hook now chowns $HERMES_HOME/profiles to hermes on every
boot, not just on the UID-remap path. Without this, files created
by docker-exec-as-root accumulate and the next reconciler run
fails with PermissionError reading SOUL.md.
Test harness:
19 passed, 0 xfailed (the two pre-Phase-4 xfail targets flip to
passing). 78 unit tests across service_manager + container_boot +
profiles_s6_hooks + gateway_s6_dispatch. Hadolint + shellcheck
pass cleanly.
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
BREAKING CHANGE: the container ENTRYPOINT is now /init (s6-overlay)
instead of /usr/bin/tini. Main hermes runs as the container CMD with
TTY inherited (preserving --tui), dashboard runs as a supervised s6-rc
service (HERMES_DASHBOARD=1 starts it; crashes auto-restart), and the
ground is laid for per-profile gateway supervision (Phase 3+4).
All five pre-s6 docker run invocation patterns continue to work
identically — verified by the Phase 0 docker harness:
docker run <image> → `hermes` with no args
docker run <image> chat -q "..." → `hermes chat -q ...` passthrough
docker run <image> sleep infinity → `sleep infinity` direct
docker run <image> bash → interactive bash
docker run -it <image> --tui → interactive Ink TUI
Phase 2 harness result: 12 passed, 2 xfailed (Phase 4 target). Hadolint
+ shellcheck pass cleanly.
Architecture pivot from plan v3 (documented in main-hermes/run header):
the plan called for main hermes to be an s6-supervised service, but
two real s6-overlay v3 mechanics blocked that — cont-init.d scripts
receive no arguments (CMD args are not visible to stage2-hook), and
`/run/s6/basedir/bin/halt` after writing the exit code did not
propagate the desired exit code (container exits 143). We use the
s6-overlay-native CMD pattern instead: main-wrapper.sh is the
container's main program (ENTRYPOINT prepends it so leading-dash
args like --version aren't intercepted by /init), exec's the final
program with stdin/stdout/stderr inherited, and the program's exit
code becomes the container exit code. main-hermes is now a no-op
`sleep infinity` slot kept for future supervised-gateway-container
modes. This trades "supervised restart of main hermes" for arg-
parity with the pre-s6 contract — main hermes was already unsupervised
under tini, so we lose nothing functional. Dashboard supervision is
the only new guarantee added by this phase.
Files added:
docker/main-wrapper.sh # arg routing + s6-setuidgid drop
docker/stage2-hook.sh # gosu-equivalent + chown + seed
docker/s6-rc.d/main-hermes/{type,run,dependencies.d/base}
docker/s6-rc.d/dashboard/{type,run,dependencies.d/base}
docker/s6-rc.d/user/contents.d/{main-hermes,dashboard}
Files changed:
Dockerfile: tini → s6-overlay install + ENTRYPOINT flip + service wiring
docker/entrypoint.sh: thin shim to stage2-hook.sh for back-compat
tests/docker/test_dashboard.py: add test_dashboard_restarts_after_crash
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md