feat(docker): remove gosu from bundled image; s6-setuidgid handles privilege drop

The s6-overlay migration replaced every runtime use of gosu with
s6-setuidgid (in stage2-hook.sh, main-wrapper.sh, per-service run
scripts, and cont-init.d hooks), but the gosu binary itself was still
being copied into the image from tianon/gosu, and several comments
across the repo still pointed to it.

Image changes:
- Drop the FROM tianon/gosu:1.19-trixie AS gosu_source stage
- Drop the COPY --from=gosu_source /gosu /usr/local/bin/ layer
- Net: one fewer base-image pull, ~12-15 MB layer eliminated

Documentation/comment refresh (no behavior change):
- Dockerfile: update root-user rationale comment + cont-init.d comment
- docker/main-wrapper.sh: drop "pre-s6 contract (gosu drop)" reference
- docker-compose.yml: update UID/GID remap comment
- .hadolint.yaml: update DL3002 ignore rationale
- website/docs/user-guide/docker.md: privilege-drop helper is s6-setuidgid now
- hermes_cli/config.py: docker_run_as_host_user docstring

tools/environments/docker.py runs *arbitrary user images* via the
terminal backend, not the bundled Hermes image. It still needs SETUID/
SETGID caps so user images that use gosu/su/s6-setuidgid all work.
Renamed the cap-list constant _GOSU_CAP_ARGS → _PRIVDROP_CAP_ARGS and
updated comments to list s6-setuidgid alongside the others as examples.
The matching test (test_security_args_include_setuid_setgid_for_gosu_drop
→ test_security_args_include_setuid_setgid_for_privdrop) was renamed
and its docstring updated; behavior is unchanged.

Verification:
- hadolint clean against .hadolint.yaml
- shellcheck clean against all docker/ shell scripts
- Image rebuilt successfully (sha 1a090924ccea)
- Docker harness: 19 passed in 41.87s (every Phase 0 test + Phase 4
  per-profile-gateway lifecycle + container-restart reconciliation)
- tests/tools/test_docker_environment.py: 23 passed (rename did not
  break test discovery; pre-existing unrelated mock warning)

The plan document (docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md)
intentionally retains its historical references to gosu — it describes
the pre-s6 entrypoint as background for understanding the migration.
This commit is contained in:
Ben 2026-05-22 10:43:57 +10:00 committed by teknium1
parent a36221ed91
commit 4b4c36cb61
No known key found for this signature in database
8 changed files with 50 additions and 44 deletions

View file

@ -148,12 +148,14 @@ def find_docker() -> Optional[str]:
# We drop all capabilities then add back the minimum needed:
# DAC_OVERRIDE - root can write to bind-mounted dirs owned by host user
# CHOWN/FOWNER - package managers (pip, npm, apt) need to set file ownership
# SETUID/SETGID - the image entrypoint drops from root to the 'hermes'
# user via `gosu`, which requires these caps. Combined with
# `no-new-privileges`, gosu still cannot escalate back to root after
# the drop, so the security posture is preserved. Omitted entirely
# when the container starts as a non-root user via --user, since
# no gosu drop is needed in that mode.
# SETUID/SETGID - the image's init drops from root to the 'hermes'
# user (via `s6-setuidgid` in the bundled image, or whatever
# privilege-drop helper a user image uses), which requires these
# caps. Combined with `no-new-privileges`, the dropped process
# still cannot escalate back to root, so the security posture is
# preserved. Omitted entirely when the container starts as a
# non-root user via --user, since no privilege drop is needed
# in that mode.
# Block privilege escalation and limit PIDs.
# /tmp is size-limited and nosuid but allows exec (needed by pip/npm builds).
_BASE_SECURITY_ARGS = [
@ -168,10 +170,11 @@ _BASE_SECURITY_ARGS = [
"--tmpfs", "/run:rw,noexec,nosuid,size=64m",
]
# Extra caps needed when the container starts as root and an entrypoint
# must drop privileges via gosu/su. Skipped when --user is passed because
# the container already starts unprivileged and never needs to switch.
_GOSU_CAP_ARGS = [
# Extra caps needed when the container starts as root and an init/entrypoint
# must drop privileges (via `s6-setuidgid`, `gosu`, `su`, or similar).
# Skipped when --user is passed because the container already starts
# unprivileged and never needs to switch.
_PRIVDROP_CAP_ARGS = [
"--cap-add", "SETUID",
"--cap-add", "SETGID",
]
@ -181,7 +184,7 @@ def _build_security_args(run_as_host_user: bool) -> list[str]:
"""Return the security/cap/tmpfs args tailored to the privilege mode."""
if run_as_host_user:
return list(_BASE_SECURITY_ARGS)
return list(_BASE_SECURITY_ARGS) + list(_GOSU_CAP_ARGS)
return list(_BASE_SECURITY_ARGS) + list(_PRIVDROP_CAP_ARGS)
def _resolve_host_user_spec() -> Optional[str]:
@ -473,7 +476,7 @@ class DockerEnvironment(BaseEnvironment):
"image default user."
)
# Fall back to the full cap set — without --user, an image's
# entrypoint may still need gosu/su to drop privileges.
# init may still need s6-setuidgid/gosu/su to drop privileges.
security_args = _build_security_args(run_as_host_user and bool(user_args))
logger.info(f"Docker volume_args: {volume_args}")