hermes-agent/tests/docker
Ben Barclay aeb992d343 fix(docker): drop docker exec to hermes uid before invoking the CLI
When operators ran `docker exec <c> hermes login` (or anything else
that wrote under $HERMES_HOME) they defaulted to root, leaving
/opt/data/auth.json root:root mode 0600. The supervised gateway
(UID 10000) then couldn't read its own credentials and returned
"Provider authentication failed: Hermes is not logged into Nous
Portal" on every Telegram/Discord/etc. message — even though
`docker exec <c> hermes chat -q ping` (also root) succeeded because
root could read its own root-owned file. _load_auth_store swallowed
PermissionError as a parse failure and copied the file aside as
auth.json.corrupt, making the diagnostic more misleading.

Fix: install a privilege-drop shim at /opt/hermes/bin/hermes,
prepended ahead of the venv on PATH. When invoked as root the shim
exec's the real venv binary via `s6-setuidgid hermes` — so any file
the docker-exec session writes is uid-aligned with the supervised
processes. Non-root callers (the supervised processes themselves,
`docker exec --user hermes`, kanban subagents, anything inside the
container that's not coming through docker-exec) hit a single exec
to the absolute venv path with no privilege change.

Recursion is impossible: the shim exec's the venv binary by
absolute path (/opt/hermes/.venv/bin/hermes), so the second hop
cannot re-enter the shim regardless of PATH state. No sentinel env
var needed (unlike #33583's gateway-run redirect which DOES need
HERMES_S6_SUPERVISED_CHILD because there's no absolute-path
equivalent for the s6 dispatch).

Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for
diagnostic sessions where the operator deliberately wants root.
Strict truthiness (1/true/yes case-insensitive); typos like `=0`
do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in
#33583.

If `s6-setuidgid` is missing (someone stripped s6-overlay in a
downstream fork), the shim exits 126 with a remediation message
pointing at `--user hermes` and the opt-out — never silently runs
as root.

Test plan:
- tests/docker/test_docker_exec_privilege_drop.py — 11 tests
  - shim drops root to hermes uid (file ownership check)
  - shim short-circuits for non-root docker exec
  - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root
  - strict-truthiness parametrization (5 falsy values reject)
  - main CMD path unaffected (recursion guard)
  - E2E: every file written by docker-exec is readable by uid 10000
- Full tests/docker/ harness: 32/32 pass against fresh image build
- shellcheck --severity=error: clean
- hadolint: clean
- Manual: reproduced the original symptom (root-owned auth.json)
  by bypassing the shim; confirmed default docker-exec produces
  hermes-owned files; confirmed opt-out env keeps root semantics.

Known follow-up: this prevents NEW instances of the bug. Volumes
that already have root:root /opt/data/auth.json from a pre-shim
image need a one-time `chown hermes:hermes` before rebooting onto
the new image. A stage2-hook chown sweep can self-heal that, but
is deferred per scope decision.
2026-05-28 13:30:36 +10:00
..
__init__.py test(docker): add conftest fixtures for docker harness 2026-05-24 18:05:14 -07:00
conftest.py fix(service_manager): s6 detection works for unprivileged hermes user 2026-05-24 18:05:33 -07:00
test_container_restart.py test(docker): poll for boot-log signal instead of fixed sleeps 2026-05-24 18:05:33 -07:00
test_dashboard.py fix(docker): dashboard slot stays 'down' when HERMES_DASHBOARD unset 2026-05-24 18:05:33 -07:00
test_docker_exec_privilege_drop.py fix(docker): drop docker exec to hermes uid before invoking the CLI 2026-05-28 13:30:36 +10:00
test_gateway_run_supervised.py fix(docker): tee supervised gateway stdout to docker logs 2026-05-28 13:18:41 +10:00
test_main_invocation.py test(docker): lock baseline behavior for Phase 0 harness 2026-05-24 18:05:14 -07:00
test_profile_gateway.py test(docker): fix svstat 'want up' assertion in profile-gateway lifecycle test 2026-05-25 12:25:06 +10:00
test_s6_profile_gateway_integration.py fix(service_manager): rip out dead port parameter 2026-05-24 18:05:33 -07:00
test_tui_passthrough.py test(docker): lock baseline behavior for Phase 0 harness 2026-05-24 18:05:14 -07:00
test_zombie_reaping.py fix(service_manager): s6 detection works for unprivileged hermes user 2026-05-24 18:05:33 -07:00