mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
Phase 4 of the s6-overlay supervision plan. Activates the Phase 3
S6ServiceManager by hooking it into the profile lifecycle and the
`hermes gateway start/stop/restart` dispatcher, and adds a cont-
init.d-time reconciliation pass that survives `docker restart`.
Task 4.0 — container-boot reconciliation:
/run/service/ is tmpfs, so every `docker restart` wipes every
per-profile gateway slot. /etc/cont-init.d/02-reconcile-profiles
invokes hermes_cli.container_boot.reconcile_profile_gateways() on
every boot, which walks $HERMES_HOME/profiles/<name>/, reads each
gateway_state.json, recreates the s6 service slot, and auto-starts
only those whose last state was 'running'. Other states
(stopped, starting, startup_failed, missing) register the slot
in the down state — avoiding crash-loops across restarts for a
gateway that was broken last boot. Per-profile outcome is recorded
to $HERMES_HOME/logs/container-boot.log.
Implementation: hermes_cli/container_boot.py + 12 unit tests.
Profile-marker is SOUL.md, not config.yaml, because `hermes profile
create` only seeds SOUL.md by default (config.yaml comes from
`hermes setup`).
Task 4.1 / 4.2 — profile create/delete hooks:
hermes_cli/profiles.py::create_profile now calls
_maybe_register_gateway_service(<canon>) at the end, which routes
through ServiceManager.register_profile_gateway when running on s6
and no-ops on host backends. delete_profile mirrors with
_maybe_unregister_gateway_service. _allocate_gateway_port produces
a deterministic SHA-256-derived port in [9200, 9800).
Task 4.3 — gateway dispatch + remove rejection arms:
_dispatch_via_service_manager_if_s6(action) intercepts
start/stop/restart at the top of each subcommand and routes them
through S6ServiceManager.{start,stop,restart}. The pre-Phase-4
`elif is_container():` rejection arms are kept as fallback for
pre-s6 containers / unsupported runtimes, but only ever fire when
detect_service_manager() != 's6'. install/uninstall under s6
print informational guidance pointing users at profile create/delete.
Removed the two xfail(strict=True) markers from
tests/docker/test_profile_gateway.py — both tests now pass strictly.
Task 4.4 — status reporting:
get_gateway_runtime_snapshot() reports
Manager: 's6 (container supervisor)' inside an s6 container instead
of 'docker (foreground)'.
Plan-vs-reality drift fixed in this commit:
- Plan's S6ServiceManager._render_run_script used
`gateway start --foreground --port {port}` — invented args; the
real CLI is `gateway run`. Switched accordingly. port arg
retained for API parity but now documented as 'currently ignored'.
- Plan's reconciler keyed on config.yaml; switched to SOUL.md
(config.yaml is created by hermes setup, not by hermes profile
create, so the original gate caught nothing).
- The plan's _dispatch helper used _profile_arg() which returns
'--profile <name>' (i.e. with the flag prefix). Switched to
_profile_suffix() which returns the bare name.
- Architecture B's docker exec doesn't get /command on PATH or
the venv on PATH; Dockerfile's runtime PATH now includes
/opt/hermes/.venv/bin so 'docker exec <c> hermes ...' works
without sourcing the venv.
- stage2-hook now chowns $HERMES_HOME/profiles to hermes on every
boot, not just on the UID-remap path. Without this, files created
by docker-exec-as-root accumulate and the next reconciler run
fails with PermissionError reading SOUL.md.
Test harness:
19 passed, 0 xfailed (the two pre-Phase-4 xfail targets flip to
passing). 78 unit tests across service_manager + container_boot +
profiles_s6_hooks + gateway_s6_dispatch. Hadolint + shellcheck
pass cleanly.
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
116 lines
5.1 KiB
Bash
Executable file
116 lines
5.1 KiB
Bash
Executable file
#!/bin/sh
|
|
# s6-overlay stage2 hook — runs as root after the supervision tree is
|
|
# up but before user services start. Handles UID/GID remap, volume
|
|
# chown, config seeding, and skills sync.
|
|
#
|
|
# Per-service privilege drop happens inside each service's `run` script
|
|
# (and in main-wrapper.sh) via s6-setuidgid, not here.
|
|
#
|
|
# Wired into the image as /etc/cont-init.d/01-hermes-setup by the
|
|
# Dockerfile. The shim at docker/entrypoint.sh forwards to this script
|
|
# so external references to docker/entrypoint.sh still work.
|
|
#
|
|
# NB: cont-init.d scripts run with no arguments — the user's CMD args
|
|
# are NOT visible here. That's fine: we use Architecture B (s6-overlay
|
|
# main-program model), so main-wrapper.sh runs the CMD with full
|
|
# stdin/stdout/stderr access and handles arg parsing there.
|
|
|
|
set -eu
|
|
|
|
HERMES_HOME="${HERMES_HOME:-/opt/data}"
|
|
INSTALL_DIR="/opt/hermes"
|
|
|
|
# --- UID/GID remap ---
|
|
if [ -n "${HERMES_UID:-}" ] && [ "$HERMES_UID" != "$(id -u hermes)" ]; then
|
|
echo "[stage2] Changing hermes UID to $HERMES_UID"
|
|
usermod -u "$HERMES_UID" hermes
|
|
fi
|
|
if [ -n "${HERMES_GID:-}" ] && [ "$HERMES_GID" != "$(id -g hermes)" ]; then
|
|
echo "[stage2] Changing hermes GID to $HERMES_GID"
|
|
# -o allows non-unique GID (e.g. macOS GID 20 "staff" may already
|
|
# exist as "dialout" in the Debian-based container image).
|
|
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
|
|
fi
|
|
|
|
# --- Fix ownership of data volume ---
|
|
actual_hermes_uid=$(id -u hermes)
|
|
needs_chown=false
|
|
if [ -n "${HERMES_UID:-}" ] && [ "$HERMES_UID" != "10000" ]; then
|
|
needs_chown=true
|
|
elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
|
|
needs_chown=true
|
|
fi
|
|
if [ "$needs_chown" = true ]; then
|
|
echo "[stage2] Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
|
|
# In rootless Podman the container's "root" is mapped to an
|
|
# unprivileged host UID — chown will fail. That's fine: the volume
|
|
# is already owned by the mapped user on the host side.
|
|
chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
|
|
echo "[stage2] Warning: chown failed (rootless container?) — continuing"
|
|
# The .venv must also be re-chowned when UID is remapped, otherwise
|
|
# lazy_deps.py cannot install platform packages (discord.py, etc.).
|
|
chown -R hermes:hermes "$INSTALL_DIR/.venv" 2>/dev/null || \
|
|
echo "[stage2] Warning: chown .venv failed (rootless container?) — continuing"
|
|
fi
|
|
|
|
# Always reset ownership of $HERMES_HOME/profiles to hermes on every
|
|
# boot. Profile dirs and files can land owned by root when commands
|
|
# are invoked via `docker exec <container> hermes …` (which defaults
|
|
# to root unless `-u` is passed), and that breaks the cont-init
|
|
# reconciler (02-reconcile-profiles) which runs as hermes and walks
|
|
# the profiles dir. Idempotent; skipped on rootless containers where
|
|
# chown would fail.
|
|
if [ -d "$HERMES_HOME/profiles" ]; then
|
|
chown -R hermes:hermes "$HERMES_HOME/profiles" 2>/dev/null || true
|
|
fi
|
|
|
|
# --- config.yaml permissions ---
|
|
# Ensure config.yaml is readable by the hermes runtime user even if it
|
|
# was edited on the host after initial ownership setup.
|
|
if [ -f "$HERMES_HOME/config.yaml" ]; then
|
|
chown hermes:hermes "$HERMES_HOME/config.yaml" 2>/dev/null || true
|
|
chmod 640 "$HERMES_HOME/config.yaml" 2>/dev/null || true
|
|
fi
|
|
|
|
# --- Seed directory structure as hermes user ---
|
|
# Run as hermes via s6-setuidgid so dirs end up owned correctly (matters
|
|
# under rootless Podman where chown back to root would fail).
|
|
s6-setuidgid hermes sh -c "mkdir -p \"$HERMES_HOME\"/cron \
|
|
\"$HERMES_HOME\"/sessions \"$HERMES_HOME\"/logs \"$HERMES_HOME\"/hooks \
|
|
\"$HERMES_HOME\"/memories \"$HERMES_HOME\"/skills \"$HERMES_HOME\"/skins \
|
|
\"$HERMES_HOME\"/plans \"$HERMES_HOME\"/workspace \"$HERMES_HOME\"/home"
|
|
|
|
# --- Install-method stamp (read by detect_install_method() in hermes status) ---
|
|
# Preserved from the tini-era entrypoint (PR #27843). Must be written as
|
|
# the hermes user so ownership matches the file's documented owner.
|
|
s6-setuidgid hermes sh -c "echo docker > \"$HERMES_HOME/.install_method\"" 2>/dev/null || true
|
|
|
|
# --- Seed config files (only on first boot) ---
|
|
seed_one() {
|
|
dest=$1
|
|
src=$2
|
|
if [ ! -f "$HERMES_HOME/$dest" ] && [ -f "$INSTALL_DIR/$src" ]; then
|
|
s6-setuidgid hermes cp "$INSTALL_DIR/$src" "$HERMES_HOME/$dest"
|
|
fi
|
|
}
|
|
seed_one ".env" ".env.example"
|
|
seed_one "config.yaml" "cli-config.yaml.example"
|
|
seed_one "SOUL.md" "docker/SOUL.md"
|
|
|
|
# auth.json: bootstrap from env on first boot only. Same semantics as the
|
|
# pre-s6 entrypoint — the [ ! -f ] guard is critical to avoid clobbering
|
|
# rotated refresh tokens on container restart.
|
|
if [ ! -f "$HERMES_HOME/auth.json" ] && [ -n "${HERMES_AUTH_JSON_BOOTSTRAP:-}" ]; then
|
|
printf '%s' "$HERMES_AUTH_JSON_BOOTSTRAP" > "$HERMES_HOME/auth.json"
|
|
chown hermes:hermes "$HERMES_HOME/auth.json" 2>/dev/null || true
|
|
chmod 600 "$HERMES_HOME/auth.json"
|
|
fi
|
|
|
|
# --- Sync bundled skills ---
|
|
if [ -d "$INSTALL_DIR/skills" ]; then
|
|
s6-setuidgid hermes sh -c \
|
|
". $INSTALL_DIR/.venv/bin/activate && python3 $INSTALL_DIR/tools/skills_sync.py" \
|
|
|| echo "[stage2] Warning: skills_sync.py failed; continuing"
|
|
fi
|
|
|
|
echo "[stage2] Setup complete; starting user services"
|