mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-05 07:41:39 +00:00
docs(s6): document container supervision; doctor + skill + user-guide updates
Phase 5 of the s6-overlay supervision plan. Documentation + small
diagnostic cleanups; no behavior changes.
website/docs/user-guide/docker.md:
- Replace the old 'entrypoint script does the bootstrap' section
with the s6-overlay boot flow (cont-init.d/01-hermes-setup,
cont-init.d/02-reconcile-profiles, static main-hermes + dashboard
services, ENTRYPOINT-as-main-program pattern).
- Add a 'Per-profile gateway supervision' subsection covering the
new lifecycle commands, restart semantics, log persistence, and
'Manager: s6 (container supervisor)' status reporting.
- Add 'Breaking change vs. pre-s6 images' callout naming the
/init ENTRYPOINT and pointing affected wrappers at the pin
workaround.
website/docs/user-guide/profiles.md:
- Add a note under 'Persistent services' pointing container users
at the docker.md section explaining s6 supervision inside the
image. Host-side systemd/launchd documentation is unchanged.
skills/software-development/hermes-s6-container-supervision/SKILL.md:
- New maintainer skill covering the supervision-tree map, file
layout, the Architecture B rationale (cont-init.d args + halt
exit-code propagation), quick recipes, and the 8 pitfalls we hit
while implementing the plan (PATH-without-/command, root-owned
profile dirs, SOUL.md as marker, the '143' anti-pattern, etc.).
hermes_cli/doctor.py:
- _check_gateway_service_linger skips on s6 (the linger concept
doesn't apply inside the container).
- New _check_s6_supervision section reports main-hermes/dashboard
state and per-profile-gateway count (registered vs supervised
up), only inside the s6 container. Host doctor output unchanged.
- External Tools / Docker check no longer emits a 'docker not
found' warning inside the container; prints an explanatory
info line instead. Still respects an explicit TERMINAL_ENV=docker
(in case the user mounted /var/run/docker.sock).
hermes_cli/gateway.py:
- Document _container_systemd_operational more precisely: it's
NOT for our Hermes Docker image (s6-overlay handles that via
detect_service_manager() == 's6'). It still covers
systemd-nspawn / k8s-with-systemd-init cases, so leaving it in
place is correct; the docstring just makes that explicit.
Test harness (verification, no test changes in this commit):
19 passed, 0 xfailed. 66 service-manager / container-boot /
profiles-s6-hooks / gateway-s6-dispatch unit tests still green.
61 doctor tests still green. Hadolint + shellcheck clean.
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
This commit is contained in:
parent
57c6e29666
commit
7d07dd60a8
5 changed files with 314 additions and 13 deletions
|
|
@ -261,24 +261,51 @@ The official image is based on `debian:13.4` and includes:
|
|||
- Python 3 with all Hermes dependencies (`uv pip install -e ".[all]"`)
|
||||
- Node.js + npm (for browser automation and WhatsApp bridge)
|
||||
- Playwright with Chromium (`npx playwright install --with-deps chromium --only-shell`)
|
||||
- ripgrep, ffmpeg, git, and tini as system utilities
|
||||
- ripgrep, ffmpeg, git, and `xz-utils` as system utilities
|
||||
- **`docker-cli`** — so agents running inside the container can drive the host's Docker daemon (bind-mount `/var/run/docker.sock` to opt in) for `docker build`, `docker run`, container inspection, etc.
|
||||
- **`openssh-client`** — enables the [SSH terminal backend](/docs/user-guide/configuration#ssh-backend) from inside the container. The SSH backend shells out to the system `ssh` binary; without this, it failed silently in containerized installs.
|
||||
- The WhatsApp bridge (`scripts/whatsapp-bridge/`)
|
||||
- **[`s6-overlay`](https://github.com/just-containers/s6-overlay) v3** as PID 1 (replaces the older `tini`) — supervises the dashboard and per-profile gateways with auto-restart on crash, reaps zombie subprocesses, and forwards signals.
|
||||
|
||||
The entrypoint script (`docker/entrypoint.sh`) bootstraps the data volume on first run:
|
||||
- Creates the directory structure (`sessions/`, `memories/`, `skills/`, etc.)
|
||||
- Copies `.env.example` → `.env` if no `.env` exists
|
||||
- Copies default `config.yaml` if missing
|
||||
- Copies default `SOUL.md` if missing
|
||||
- Syncs bundled skills using a manifest-based approach (preserves user edits)
|
||||
- Optionally launches `hermes dashboard` as a background side-process when `HERMES_DASHBOARD=1` (see [Running the dashboard](#running-the-dashboard))
|
||||
- Then runs `hermes` with whatever arguments you pass
|
||||
The container's `ENTRYPOINT` is s6-overlay's `/init`. On boot it:
|
||||
1. Runs `/etc/cont-init.d/01-hermes-setup` (= `docker/stage2-hook.sh`) as root: optional UID/GID remap, fixes volume ownership, seeds `.env` / `config.yaml` / `SOUL.md` on first boot, syncs bundled skills.
|
||||
2. Runs `/etc/cont-init.d/02-reconcile-profiles` (= `hermes_cli.container_boot`): walks `$HERMES_HOME/profiles/<name>/`, recreates the per-profile gateway s6 service slot under `/run/service/gateway-<profile>/`, and auto-starts only those whose last recorded state was `running` (see [Per-profile gateway supervision](#per-profile-gateway-supervision)).
|
||||
3. Starts the static `main-hermes` and `dashboard` s6-rc services.
|
||||
4. Exec's the container's CMD as the main program (`/opt/hermes/docker/main-wrapper.sh`), which routes the arguments the user passed to `docker run`:
|
||||
- no args → `hermes` (the default)
|
||||
- first arg is an executable on PATH (e.g. `sleep`, `bash`) → exec it directly
|
||||
- anything else → `hermes <args>` (subcommand passthrough)
|
||||
The container exits when this main program exits, with its exit code.
|
||||
|
||||
:::warning
|
||||
Do not override the image entrypoint unless you keep `/opt/hermes/docker/entrypoint.sh` in the command chain. The entrypoint drops root privileges to the `hermes` user before gateway state files are created. Starting `hermes gateway run` as root inside the official image is refused by default because it can leave root-owned files in `/opt/data` and break later dashboard or gateway starts. Set `HERMES_ALLOW_ROOT_GATEWAY=1` only when you intentionally accept that risk.
|
||||
:::warning Breaking change vs. pre-s6 images
|
||||
The container ENTRYPOINT is now `/init` (s6-overlay), not `/usr/bin/tini`. All five documented `docker run` invocation patterns (no args, `chat -q "…"`, `sleep infinity`, `bash`, `--tui`) behave identically to the tini-based image. If you have a downstream wrapper that depended on tini-specific signal behavior or hard-coded `/usr/bin/tini --` invocation, pin to the previous image tag.
|
||||
:::
|
||||
|
||||
:::warning Privilege model
|
||||
Do not override the image entrypoint unless you keep `/init` (or, equivalently, the legacy `docker/entrypoint.sh` shim that forwards to the stage2 hook) in the command chain. s6-overlay's `/init` runs as root so it can chown the volume on first boot, then drops to the `hermes` user via `s6-setuidgid` for every supervised service AND for the main program. Starting `hermes gateway run` as root inside the official image is refused by default because it can leave root-owned files in `/opt/data` and break later dashboard or gateway starts. Set `HERMES_ALLOW_ROOT_GATEWAY=1` only when you intentionally accept that risk.
|
||||
:::
|
||||
|
||||
### Per-profile gateway supervision
|
||||
|
||||
Inside the container, each profile created with `hermes profile create <name>` automatically gets an s6-supervised gateway service registered at `/run/service/gateway-<name>/`. The lifecycle commands you'd run on the host work the same way:
|
||||
|
||||
```sh
|
||||
hermes profile create coder # registers gateway-coder s6 slot
|
||||
hermes -p coder gateway start # s6-svc -u → supervised gateway
|
||||
hermes -p coder gateway stop # s6-svc -d → service down
|
||||
hermes -p coder gateway restart # s6-svc -t → SIGTERM the supervisor
|
||||
hermes profile delete coder # tears down the s6 slot
|
||||
```
|
||||
|
||||
**Supervision benefits over the pre-s6 image:**
|
||||
|
||||
- Gateway crashes are auto-restarted by `s6-supervise` after a ~1s backoff.
|
||||
- Dashboard crashes are auto-restarted (set `HERMES_DASHBOARD=1` to start it).
|
||||
- `docker restart` preserves running gateways: the cont-init reconciler reads `$HERMES_HOME/profiles/<name>/gateway_state.json` and brings the slot back up if the last recorded state was `running`. Stopped gateways stay stopped.
|
||||
- Per-profile gateway logs persist under `$HERMES_HOME/logs/gateways/<profile>/current` (rotated by `s6-log`), and the reconciler's actions are appended to `$HERMES_HOME/logs/container-boot.log` per boot.
|
||||
|
||||
`hermes status` inside the container reports `Manager: s6 (container supervisor)`. Use `/command/s6-svstat /run/service/gateway-<name>` for the raw supervisor view (note `/command/` is on PATH for supervision-tree processes only; pass the absolute path when calling from `docker exec`).
|
||||
|
||||
## Upgrading
|
||||
|
||||
Pull the latest image and recreate the container. Your data directory is untouched.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue