docs(docker): refresh user-guide page for s6-overlay reality

The page was last meaningfully rewritten in the pre-s6 (tini) era and had drifted on five points that no longer matched the image: 1. "Running the dashboard" claimed the entrypoint backgrounds `hermes dashboard` and prefixes its output with `[dashboard]`. That was the pre-s6 entrypoint.sh path; under s6 the dashboard is a supervised s6-rc service (`docker/s6-rc.d/dashboard/run`) with no sed-prefix pipeline. Rewrote the section accordingly. 2. The default for `HERMES_DASHBOARD_HOST` was documented as `127.0.0.1`. The s6 run script defaults it to `0.0.0.0` (`dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"`). Fixed the table and the surrounding prose. 3. Multi-profile was documented as "not recommended in Docker — run one container per profile." That advice was load-bearing when there was no in-container supervisor, but the s6 architecture explicitly adds per-profile gateway supervision: each profile created via `hermes profile create <name>` gets a slot under `/run/service/gateway-<name>/`, the `02-reconcile-profiles` cont-init script restores them across `docker restart` from `gateway_state.json`, and `hermes gateway start/stop/restart` is intercepted by `_dispatch_via_service_manager_if_s6` to route through `s6-svc`. Pivoted the section to "one container, many supervised profile gateways" as the default, with a comparison table and a "When you DO want a separate container" escape hatch for the genuine resource-isolation / network-segmentation cases. 4. The Compose example trailer also claimed `[dashboard]` log prefixing. Replaced with the actual log routing. 5. Added a new "Where the logs go" section covering all four log surfaces: per-profile gateways (tee'd to `docker logs` AND `${HERMES_HOME}/logs/gateways/<profile>/current` since PR b34532319), dashboard (`docker logs`, no prefix), boot reconciler (`container-boot.log`), and `hermes logs`. The gateway-mode and Compose sections cross-reference this rather than each carrying their own routing prose. Added a new "docker exec automatically drops to the hermes user" subsection under "What the Dockerfile does", next to the existing Privilege model warning. Documents the `/opt/hermes/bin/hermes` shim (landed via the docker-exec privilege-drop work) — operators don't need to remember `--user hermes` for `docker exec hermes login`, `docker exec hermes profile create …`, etc. The historical footgun (`auth.json` written as `root:root`, supervised gateway then can't read its own auth file) is mentioned only as context for what the fail-loud `exit 126` is protecting against, not as a problem the reader needs to solve. The `HERMES_DOCKER_EXEC_AS_ROOT=1` opt-out is documented for diagnostic sessions. The "Permission denied" troubleshooting subsection now carries a single-line pointer to the new section instead of duplicating it. The `--insecure` framing reflects PR #fb5125362 (opt-in via `HERMES_DASHBOARD_INSECURE`, not derived from bind host): the OAuth gate is the authority, the bind host alone never implies `--insecure`, and opting out is an explicit security trade-off. Anchors verified resolve. i18n zh-Hans mirror left for the translation flow to catch up.
2026-07-14 14:12:44 +00:00 · 2026-05-29 11:33:32 +10:00 · 2026-05-29 11:33:32 +10:00 · 42612aa350
commit 42612aa350
parent 3c6e70aef1
1 changed files with 96 additions and 70 deletions
--- a/website/docs/user-guide/docker.md
+++ b/website/docs/user-guide/docker.md
@ -54,12 +54,7 @@ This behavior applies to the s6-based image only. Earlier (tini-based) images st
 :::

 :::note Where gateway logs go
-Inside the s6 image, the supervised gateway's output is tee'd to two destinations:
-
- **`docker logs <container>`** — every line in real time (raw, no extra prefix). This is the same stream you'd get from a foreground gateway, so existing `docker logs --follow` / `--timestamps` / log-shipper integrations work unchanged.
- **`${HERMES_HOME}/logs/gateways/<profile>/current`** (mapped to `~/.hermes/logs/gateways/<profile>/current` on the host via the volume mount) — rotated, with an ISO 8601 timestamp prepended per line. Rotation is 10 archives × 1 MB each, so it can't fill the disk. This is what `hermes logs` reads and what survives container restarts.
-
-The per-profile reconciler keeps a separate audit log at `${HERMES_HOME}/logs/container-boot.log` — one line per profile per container boot, recording whether each gateway was restored to its prior state.
+See the [Where the logs go](#where-the-logs-go) section below for the full routing map (per-profile gateways, dashboard, boot reconciler, container-wide `docker logs`).
 :::

 Note: the API server is gated on `API_SERVER_ENABLED=true`. To expose it beyond `127.0.0.1` inside the container, also set `API_SERVER_HOST=0.0.0.0` and an `API_SERVER_KEY` (minimum 8 characters — generate one with `openssl rand -hex 32`). Example:
@ -81,7 +76,7 @@ Opening any port on an internet facing machine is a security risk. You should no

 ## Running the dashboard

-The built-in web dashboard runs as an optional side-process inside the same container as the gateway. Set `HERMES_DASHBOARD=1` to run the dashboard on container loopback (`127.0.0.1`) by default:
+The built-in web dashboard runs as a supervised s6-rc service alongside the gateway in the same container. Set `HERMES_DASHBOARD=1` to bring it up:

 ```sh
 docker run -d \
@ -89,54 +84,38 @@ docker run -d \
  --restart unless-stopped \
  -v ~/.hermes:/opt/data \
  -p 8642:8642 \
+  -p 9119:9119 \
  -e HERMES_DASHBOARD=1 \
  nousresearch/hermes-agent gateway run
 ```

-The entrypoint starts `hermes dashboard` in the background (running as the non-root `hermes` user) before `exec`-ing the main command. Dashboard output is prefixed with `[dashboard]` in `docker logs` so it's easy to separate from gateway logs.
+The dashboard is supervised by s6 — if it crashes, `s6-supervise` restarts it automatically after a short backoff. Dashboard stdout/stderr is forwarded to `docker logs <container>` (no prefix; the gateway's own output now lives in a per-profile s6-log file — see [Where the logs go](#where-the-logs-go) below — so the two streams don't clash).

 | Environment variable | Description | Default |
 |---------------------|-------------|---------|
-| `HERMES_DASHBOARD` | Set to `1` (or `true` / `yes`) to launch the dashboard alongside the main command | *(unset — dashboard not started)* |
-| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `127.0.0.1` |
+| `HERMES_DASHBOARD` | Set to `1` (or `true` / `yes`) to enable the supervised dashboard service | *(unset — service is registered but stays down)* |
+| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `0.0.0.0` |
 | `HERMES_DASHBOARD_PORT` | Port for the dashboard HTTP server | `9119` |
 | `HERMES_DASHBOARD_TUI` | Set to `1` to expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket) | *(unset)* |
 | `HERMES_DASHBOARD_INSECURE` | Set to `1` (or `true` / `yes`) to bind without the OAuth auth gate. Only use on trusted networks behind a reverse proxy without the OAuth contract — the dashboard exposes API keys and session data | *(unset — gate enforced when a `DashboardAuthProvider` is registered)* |

-By default, the dashboard stays on loopback (`127.0.0.1`) to avoid exposing
-the web surface over the network. To publish it intentionally, set
-`HERMES_DASHBOARD_HOST=0.0.0.0`. The dashboard's OAuth auth gate engages
-automatically whenever:
+The dashboard inside the container defaults to binding `0.0.0.0` — without it, the published `-p 9119:9119` port would not be reachable from the host. To restrict the bind to container loopback (for sidecar / reverse-proxy setups), set `HERMES_DASHBOARD_HOST=127.0.0.1`.

-1. The bind host is non-loopback, **and**
+The dashboard's OAuth auth gate engages automatically when both of the following are true:
+
+1. The bind host is non-loopback (e.g. the default `0.0.0.0` inside the container), **and**
 2. A `DashboardAuthProvider` plugin is registered.

-The bundled `dashboard_auth/nous` provider activates whenever
-`HERMES_DASHBOARD_OAUTH_CLIENT_ID` is set (see
-[Web Dashboard → Authentication](features/web-dashboard.md)). With the
-gate engaged, browser callers are redirected to the configured portal's
-OAuth flow before they can reach any protected route.
+The bundled `dashboard_auth/nous` provider activates whenever `HERMES_DASHBOARD_OAUTH_CLIENT_ID` is set (see [Web Dashboard → Authentication](features/web-dashboard.md)). With the gate engaged, browser callers are redirected to the configured portal's OAuth flow before they can reach any protected route.

-If no provider is registered and the bind is non-loopback, the dashboard
-**fails closed at startup** with a specific error pointing at the
-missing env var. To opt out of the gate explicitly — for a trusted-LAN
-deployment behind your own reverse proxy without the OAuth contract —
-set `HERMES_DASHBOARD_INSECURE=1`. This re-enables the legacy "no auth,
-loud warning" mode and is the only path that disables the gate; the bind
-host does not implicitly determine `--insecure` anymore.
+If no provider is registered and the bind is non-loopback, the dashboard **fails closed at startup** with a specific error pointing at the missing env var. To opt out of the gate explicitly — for a trusted-LAN deployment behind your own reverse proxy without the OAuth contract — set `HERMES_DASHBOARD_INSECURE=1`. This is the **only** path that disables the gate; the bind host alone never implies `--insecure` (it used to, but that predated the OAuth gate and silently disabled it on every container-deployed dashboard).

-:::note
-The dashboard runs as a supervised s6 service inside the container. If
-the dashboard process crashes, s6-overlay restarts it automatically
-after a short backoff — you'll see a new PID without needing to
-restart the container. Logs and crash output are visible via
-`docker logs <container>` (s6 forwards service stdout/stderr there).
-
-Running the dashboard as a separate container is not supported: its
-gateway-liveness detection requires a shared PID namespace with the
-gateway process.
+:::warning `HERMES_DASHBOARD_INSECURE=1` exposes API keys
+Opting out of the OAuth gate serves the dashboard's API surface (including model keys and session data) to anyone who can reach the published port. Only enable it when you have your own auth layer in front, or on a trusted LAN you fully control.
 :::

+Running the dashboard as a separate container is not supported: its gateway-liveness detection requires a shared PID namespace with the gateway process.
+
 ## Running interactively (CLI chat)

 To open an interactive chat session against a running data directory:
@ -179,37 +158,60 @@ Never run two Hermes **gateway** containers against the same data directory simu

 ## Multi-profile support

-Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` directories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **When running under Docker, using Hermes' built-in multi-profile feature is not recommended.**
+Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` subdirectories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **Inside the official Docker image, the s6 supervision tree treats each profile as a first-class supervised service**, so the recommended deployment is **one container hosting all profiles**.

-Instead, the recommended pattern is **one container per profile**, with each container bind-mounting its own host directory as `/opt/data`:
+Each profile created with `hermes profile create <name>` gets:
+
+- A dedicated s6 service slot at `/run/service/gateway-<name>/`, registered dynamically by the runtime — no container rebuild required.
+- Auto-restart on crash, backoff-managed by `s6-supervise`.
+- Per-profile rotated logs at `${HERMES_HOME}/logs/gateways/<name>/current` (10 archives × 1 MB each).
+- State persistence across container restarts: the boot-time reconciler reads `gateway_state.json` from each profile directory and brings the slot back up only for profiles whose last recorded state was `running`. Stopped profiles stay stopped.
+
+The lifecycle commands you'd run on the host work the same way from inside the container:

 ```sh
-# Work profile
-docker run -d \
-  --name hermes-work \
-  --restart unless-stopped \
-  -v ~/.hermes-work:/opt/data \
-  -p 8642:8642 \
-  nousresearch/hermes-agent gateway run
+# Create a profile — registers the gateway-<name> s6 slot.
+docker exec hermes hermes profile create coder

-# Personal profile
-docker run -d \
-  --name hermes-personal \
-  --restart unless-stopped \
-  -v ~/.hermes-personal:/opt/data \
-  -p 8643:8642 \
-  nousresearch/hermes-agent gateway run
+# Start / stop / restart — dispatches s6-svc; the gateway lifecycle survives docker restart.
+docker exec hermes hermes -p coder gateway start
+docker exec hermes hermes -p coder gateway stop
+docker exec hermes hermes -p coder gateway restart
+
+# Status — reports `Manager: s6 (container supervisor)` inside the container.
+docker exec hermes hermes -p coder gateway status
+
+# Remove a profile — tears down the s6 slot too.
+docker exec hermes hermes profile delete coder
 ```

-Why separate containers over profiles in Docker:
+Under the hood, `hermes gateway start/stop/restart` inside the container is intercepted and routed to `s6-svc` against the right service directory; you don't need to learn the s6 commands directly. For raw supervisor state, use `/command/s6-svstat /run/service/gateway-<name>` (note `/command/` is on PATH only for processes spawned by the supervision tree — when calling from `docker exec`, pass the absolute path).

- **Isolation** — each container has its own filesystem, process table, and resource limits. A crash, dependency change, or runaway session in one profile can't affect another.
- **Independent lifecycle** — upgrade, restart, pause, or roll back each agent separately (`docker restart hermes-work` leaves `hermes-personal` untouched).
- **Clean port and network separation** — each gateway binds its own host port; there's no risk of cross-talk between chat platforms or API servers.
- **Simpler mental model** — the container *is* the profile. Backups, migrations, and permissions all follow the bind-mounted directory, with no extra `--profile` flags to remember.
- **Avoids concurrent-write risk** — the warning above about never running two gateways against the same data directory still applies to profiles within a single container.
+### Why one container with many profiles, not many containers

-In Docker Compose, this just means declaring one service per profile with distinct `container_name`, `volumes`, and `ports`:
+Before the s6 migration, "one container per profile" was the recommended pattern because there was no in-container supervisor to manage multiple gateways. With s6 as PID 1, that's no longer necessary, and the single-container layout is simpler in almost every dimension:
+
+| | One container, many profiles | One container per profile |
+|---|---|---|
+| Disk overhead | One image, one bundled venv, one Playwright cache | N images / N caches |
+| Memory overhead | Shared Python interpreter cache, shared node_modules | Duplicated per container |
+| Profile creation | `docker exec ... hermes profile create <name>` (seconds) | New `docker run` invocation + port allocation + bind-mount config |
+| Per-profile crash recovery | `s6-supervise` auto-restart | Docker's `--restart unless-stopped` (slower, kills sibling work) |
+| Logs | Per-profile rotated file via `s6-log`, plus container-boot audit log | `docker logs <name>` per container — no built-in rotation |
+| Backup | One `~/.hermes` directory | N directories to coordinate |
+
+The default profile (`default`) is always registered on first boot, so a fresh container ships with one supervised gateway out of the box. Additional profiles are pure runtime adds.
+
+### When you DO want a separate container
+
+Profile-in-container is the default. Run a separate container per profile only when you have a specific reason:
+
+- **Resource isolation per workload** — e.g. a runaway browser-tool session in profile A shouldn't be able to OOM profile B. Containers give you `--memory` / `--cpus` per profile.
+- **Independent image pinning** — different upstream image tags per workload.
+- **Network segmentation** — distinct Docker networks per profile (e.g. one customer-facing, one internal).
+- **Compliance / blast radius** — distinct credentials never share an OS-level process tree.
+
+In those cases, declare one service per profile with distinct `container_name`, `volumes`, and `ports`:

 ```yaml
 services:
@ -234,6 +236,24 @@ services:
      - ~/.hermes-personal:/opt/data
 ```

+The warning from [Persistent volumes](#persistent-volumes) still applies: never point two containers at the same `~/.hermes` directory simultaneously. The s6 supervisor inside each container manages its own profile set; cross-container sharing of a data volume corrupts session files and memory stores.
+
+## Where the logs go
+
+The s6 container has four distinct log surfaces, and "why isn't my gateway showing anything in `docker logs`" is a common surprise. Cheatsheet:
+
+| Source | Where it lands | How to read it |
+|---|---|---|
+| **Per-profile gateway** (`hermes gateway run` and per-profile gateways under s6) | Tee'd to two places: `docker logs <container>` (real time, no extra prefix) **and** `${HERMES_HOME}/logs/gateways/<profile>/current` (rotated, ISO-8601 timestamped, 10 archives × 1 MB each) | `docker logs -f hermes` or `tail -F ~/.hermes/logs/gateways/default/current` on the host |
+| **Dashboard** (when `HERMES_DASHBOARD=1`) | `docker logs <container>` (no prefix) | `docker logs -f hermes` — interleaved with gateway lines |
+| **Boot reconciler** (records which profile gateways were restored on each container start) | `${HERMES_HOME}/logs/container-boot.log` (append-only audit log) | `tail -F ~/.hermes/logs/container-boot.log` |
+| **Generic Hermes logs** (`agent.log`, `errors.log`) | `${HERMES_HOME}/logs/` (profile-aware) | `docker exec hermes hermes logs --follow [--level WARNING] [--session <id>]` |
+
+Two practical consequences worth knowing:
+
+- The file copy at `logs/gateways/<profile>/current` is what survives container restarts. `docker logs` only retains output from the current container's lifetime (and is wiped on `docker rm`); the rotated files persist on the bind-mounted volume.
+- The boot reconciler's audit line shape is `<iso-timestamp> profile=<name> prior_state=<state> action=<registered|started>`, so a quick `grep profile=coder ~/.hermes/logs/container-boot.log` reveals when a given profile was last restored and whether s6 auto-started it.
+
 ## Environment variable forwarding

 API keys are read from `/opt/data/.env` inside the container. You can also pass environment variables directly:
@ -281,7 +301,7 @@ services:
          cpus: "2.0"
 ```

-Start with `docker compose up -d` and view logs with `docker compose logs -f`. Dashboard output is prefixed with `[dashboard]` so it's easy to filter from gateway logs.
+Start with `docker compose up -d` and view logs with `docker compose logs -f`. The supervised gateway's stdout is also tee'd to `${HERMES_HOME}/logs/gateways/<profile>/current` on the volume — see [Where the logs go](#where-the-logs-go) for the full routing map.

 ## Optional: Linux desktop audio bridge

@ -415,24 +435,28 @@ The container ENTRYPOINT is now `/init` (s6-overlay), not `/usr/bin/tini`. All f
 Do not override the image entrypoint unless you keep `/init` (or, equivalently, the legacy `docker/entrypoint.sh` shim that forwards to the stage2 hook) in the command chain. s6-overlay's `/init` runs as root so it can chown the volume on first boot, then drops to the `hermes` user via `s6-setuidgid` for every supervised service AND for the main program. Starting `hermes gateway run` as root inside the official image is refused by default because it can leave root-owned files in `/opt/data` and break later dashboard or gateway starts. Set `HERMES_ALLOW_ROOT_GATEWAY=1` only when you intentionally accept that risk.
 :::

-### Per-profile gateway supervision
+### `docker exec` automatically drops to the `hermes` user

-Inside the container, each profile created with `hermes profile create <name>` automatically gets an s6-supervised gateway service registered at `/run/service/gateway-<name>/`. The lifecycle commands you'd run on the host work the same way:
+`docker exec hermes <cmd>` defaults to running as root inside the container, but the image ships a thin shim at `/opt/hermes/bin/hermes` (earliest on PATH) that detects root callers and transparently re-execs through `s6-setuidgid hermes`. So `docker exec hermes login`, `docker exec hermes profile create …`, `docker exec hermes setup`, etc. all write files owned by UID 10000 — i.e. readable by the supervised gateway — with no extra `--user` flag needed. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents inside the container) hit a short-circuit that exec's the venv binary directly, so there's no overhead on the hot paths.
+
+If you specifically need a `docker exec` that retains root semantics (diagnostic sessions, inspecting root-only state, files outside `/opt/data` that root happens to own), opt out per invocation:

 ```sh
-hermes profile create coder            # registers gateway-coder s6 slot
-hermes -p coder gateway start          # s6-svc -u  → supervised gateway
-hermes -p coder gateway stop           # s6-svc -d  → service down
-hermes -p coder gateway restart        # s6-svc -t  → SIGTERM the supervisor
-hermes profile delete coder            # tears down the s6 slot
+docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 hermes <cmd>
 ```

+The shim accepts `1` / `true` / `yes` (case-insensitive). Anything else — including typos like `=0` — falls through to the drop, so silent opt-outs aren't possible. If `s6-setuidgid` isn't available (custom builds that stripped s6-overlay), the shim refuses to run as root and exits 126 instead, surfacing the broken privilege model loudly rather than regressing to the historical footgun where `docker exec hermes login` would write `auth.json` as `root:root` and break the supervised gateway's auth on every chat platform message.
+
+### Per-profile gateway supervision
+
+Each profile created with `hermes profile create <name>` automatically gets an s6-supervised gateway service registered at `/run/service/gateway-<name>/`, with state-persistent auto-restart across container restarts. See [Multi-profile support](#multi-profile-support) above for the user-facing workflow and the lifecycle commands.
+
 **Supervision benefits over the pre-s6 image:**

 - Gateway crashes are auto-restarted by `s6-supervise` after a ~1s backoff.
- Dashboard crashes are auto-restarted (set `HERMES_DASHBOARD=1` to start it).
+- Dashboard, when enabled with `HERMES_DASHBOARD=1`, is supervised on the same supervision tree and gets the same auto-restart treatment.
 - `docker restart` preserves running gateways: the cont-init reconciler reads `$HERMES_HOME/profiles/<name>/gateway_state.json` and brings the slot back up if the last recorded state was `running`. Stopped gateways stay stopped.
- Per-profile gateway logs persist under `$HERMES_HOME/logs/gateways/<profile>/current` (rotated by `s6-log`), and the reconciler's actions are appended to `$HERMES_HOME/logs/container-boot.log` per boot.
+- Per-profile gateway logs persist under `$HERMES_HOME/logs/gateways/<profile>/current` (rotated by `s6-log`), and the reconciler's actions are appended to `$HERMES_HOME/logs/container-boot.log` per boot. See [Where the logs go](#where-the-logs-go) for the full routing map.

 `hermes status` inside the container reports `Manager: s6 (container supervisor)`. Use `/command/s6-svstat /run/service/gateway-<name>` for the raw supervisor view (note `/command/` is on PATH for supervision-tree processes only; pass the absolute path when calling from `docker exec`).

@ -692,6 +716,8 @@ The container's stage2 hook drops privileges to the non-root `hermes` user (UID
 chmod -R 755 ~/.hermes
 ```

+`docker exec hermes <cmd>` automatically drops to UID 10000 too — see [`docker exec` automatically drops to the `hermes` user](#docker-exec-automatically-drops-to-the-hermes-user) for details and the per-invocation opt-out.
+
 ### Browser tools not working

 Playwright needs shared memory. Add `--shm-size=1g` to your Docker run command: