hermes-agent/tests/hermes_cli/test_profiles_s6_hooks.py
Ben 2afefc501c
feat(docker): per-profile s6 supervision + container-restart reconciliation
Phase 4 of the s6-overlay supervision plan. Activates the Phase 3
S6ServiceManager by hooking it into the profile lifecycle and the
`hermes gateway start/stop/restart` dispatcher, and adds a cont-
init.d-time reconciliation pass that survives `docker restart`.

Task 4.0 — container-boot reconciliation:
  /run/service/ is tmpfs, so every `docker restart` wipes every
  per-profile gateway slot. /etc/cont-init.d/02-reconcile-profiles
  invokes hermes_cli.container_boot.reconcile_profile_gateways() on
  every boot, which walks $HERMES_HOME/profiles/<name>/, reads each
  gateway_state.json, recreates the s6 service slot, and auto-starts
  only those whose last state was 'running'. Other states
  (stopped, starting, startup_failed, missing) register the slot
  in the down state — avoiding crash-loops across restarts for a
  gateway that was broken last boot. Per-profile outcome is recorded
  to $HERMES_HOME/logs/container-boot.log.

  Implementation: hermes_cli/container_boot.py + 12 unit tests.
  Profile-marker is SOUL.md, not config.yaml, because `hermes profile
  create` only seeds SOUL.md by default (config.yaml comes from
  `hermes setup`).

Task 4.1 / 4.2 — profile create/delete hooks:
  hermes_cli/profiles.py::create_profile now calls
  _maybe_register_gateway_service(<canon>) at the end, which routes
  through ServiceManager.register_profile_gateway when running on s6
  and no-ops on host backends. delete_profile mirrors with
  _maybe_unregister_gateway_service. _allocate_gateway_port produces
  a deterministic SHA-256-derived port in [9200, 9800).

Task 4.3 — gateway dispatch + remove rejection arms:
  _dispatch_via_service_manager_if_s6(action) intercepts
  start/stop/restart at the top of each subcommand and routes them
  through S6ServiceManager.{start,stop,restart}. The pre-Phase-4
  `elif is_container():` rejection arms are kept as fallback for
  pre-s6 containers / unsupported runtimes, but only ever fire when
  detect_service_manager() != 's6'. install/uninstall under s6
  print informational guidance pointing users at profile create/delete.

  Removed the two xfail(strict=True) markers from
  tests/docker/test_profile_gateway.py — both tests now pass strictly.

Task 4.4 — status reporting:
  get_gateway_runtime_snapshot() reports
  Manager: 's6 (container supervisor)' inside an s6 container instead
  of 'docker (foreground)'.

Plan-vs-reality drift fixed in this commit:
  - Plan's S6ServiceManager._render_run_script used
    `gateway start --foreground --port {port}` — invented args; the
    real CLI is `gateway run`. Switched accordingly. port arg
    retained for API parity but now documented as 'currently ignored'.
  - Plan's reconciler keyed on config.yaml; switched to SOUL.md
    (config.yaml is created by hermes setup, not by hermes profile
    create, so the original gate caught nothing).
  - The plan's _dispatch helper used _profile_arg() which returns
    '--profile <name>' (i.e. with the flag prefix). Switched to
    _profile_suffix() which returns the bare name.
  - Architecture B's docker exec doesn't get /command on PATH or
    the venv on PATH; Dockerfile's runtime PATH now includes
    /opt/hermes/.venv/bin so 'docker exec <c> hermes ...' works
    without sourcing the venv.
  - stage2-hook now chowns $HERMES_HOME/profiles to hermes on every
    boot, not just on the UID-remap path. Without this, files created
    by docker-exec-as-root accumulate and the next reconciler run
    fails with PermissionError reading SOUL.md.

Test harness:
  19 passed, 0 xfailed (the two pre-Phase-4 xfail targets flip to
  passing). 78 unit tests across service_manager + container_boot +
  profiles_s6_hooks + gateway_s6_dispatch. Hadolint + shellcheck
  pass cleanly.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:33 -07:00

190 lines
6.7 KiB
Python

"""Tests for the Phase 4 s6 hooks in hermes_cli.profiles.
Specifically: _allocate_gateway_port, _maybe_register_gateway_service,
_maybe_unregister_gateway_service. The integration with
create_profile and delete_profile is covered indirectly by the
existing TestCreateProfile and TestDeleteProfile classes in
tests/hermes_cli/test_profiles.py; here we only exercise the new
helper surface that doesn't touch the filesystem.
"""
from __future__ import annotations
from typing import Any
import pytest
from hermes_cli.profiles import (
_allocate_gateway_port,
_maybe_register_gateway_service,
_maybe_unregister_gateway_service,
)
# ---------------------------------------------------------------------------
# _allocate_gateway_port
# ---------------------------------------------------------------------------
def test_allocate_gateway_port_is_deterministic() -> None:
"""Same profile name → same port across calls. This matters because
a profile's gateway must come back up on the same port across
container restarts."""
a = _allocate_gateway_port("coder")
b = _allocate_gateway_port("coder")
assert a == b
def test_allocate_gateway_port_in_advertised_range() -> None:
"""[9200, 9800) — the window the helper's docstring promises."""
for name in ("a", "b", "coder", "assistant", "very-long-profile-name-here"):
port = _allocate_gateway_port(name)
assert 9200 <= port < 9800, f"{name} got {port}"
def test_allocate_gateway_port_distributes_across_range() -> None:
"""Sanity check: ports for ~100 random-ish names should land in
enough distinct buckets that the distribution is plausibly uniform.
Catches accidental hash truncation that would collapse the range."""
ports = {_allocate_gateway_port(f"profile-{i}") for i in range(100)}
# 100 inputs mapped into 600 slots — expect at least ~60 distinct.
assert len(ports) >= 60, f"Only {len(ports)} distinct ports across 100 names"
# ---------------------------------------------------------------------------
# _maybe_register_gateway_service / _maybe_unregister_gateway_service
# ---------------------------------------------------------------------------
class _HostManager:
"""Mimics a host backend that doesn't support runtime registration."""
kind = "systemd"
def supports_runtime_registration(self) -> bool:
return False
def register_profile_gateway(self, *args: Any, **kwargs: Any) -> None:
raise AssertionError("host backend register_profile_gateway should not be called")
def unregister_profile_gateway(self, *args: Any, **kwargs: Any) -> None:
raise AssertionError("host backend unregister_profile_gateway should not be called")
class _S6Manager:
"""Mimics S6ServiceManager just enough for the hooks."""
kind = "s6"
def __init__(self) -> None:
self.registered: list[tuple[str, int]] = []
self.unregistered: list[str] = []
self.raise_on_register: Exception | None = None
self.raise_on_unregister: Exception | None = None
def supports_runtime_registration(self) -> bool:
return True
def register_profile_gateway(
self, profile: str, *, port: int,
extra_env: dict[str, str] | None = None,
) -> None:
if self.raise_on_register is not None:
raise self.raise_on_register
self.registered.append((profile, port))
def unregister_profile_gateway(self, profile: str) -> None:
if self.raise_on_unregister is not None:
raise self.raise_on_unregister
self.unregistered.append(profile)
def test_register_noop_on_host(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.setattr(
"hermes_cli.service_manager.get_service_manager",
lambda: _HostManager(),
)
# Should NOT raise the AssertionError from _HostManager.register
_maybe_register_gateway_service("hostprof")
def test_register_calls_through_on_s6(monkeypatch: pytest.MonkeyPatch) -> None:
mgr = _S6Manager()
monkeypatch.setattr(
"hermes_cli.service_manager.get_service_manager", lambda: mgr,
)
_maybe_register_gateway_service("coder")
assert len(mgr.registered) == 1
profile, port = mgr.registered[0]
assert profile == "coder"
assert 9200 <= port < 9800
def test_register_swallows_duplicate_value_error(
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""A pre-existing s6 registration (from container-boot reconcile)
is a benign condition — register must not propagate ValueError."""
mgr = _S6Manager()
mgr.raise_on_register = ValueError("already registered")
monkeypatch.setattr(
"hermes_cli.service_manager.get_service_manager", lambda: mgr,
)
# Should NOT raise
_maybe_register_gateway_service("coder")
def test_register_swallows_arbitrary_error(
monkeypatch: pytest.MonkeyPatch, capsys: pytest.CaptureFixture[str],
) -> None:
"""Even an unexpected exception from the manager must not bring
down `hermes profile create` — print and continue."""
mgr = _S6Manager()
mgr.raise_on_register = RuntimeError("svscanctl exploded")
monkeypatch.setattr(
"hermes_cli.service_manager.get_service_manager", lambda: mgr,
)
_maybe_register_gateway_service("coder")
captured = capsys.readouterr()
assert "Could not register" in captured.out
def test_register_swallows_no_backend_runtime_error(
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""When `get_service_manager()` raises RuntimeError (no backend
detected), the hook must silently no-op."""
def _no_backend() -> None:
raise RuntimeError("no supported service manager detected")
monkeypatch.setattr(
"hermes_cli.service_manager.get_service_manager", _no_backend,
)
# Should NOT raise
_maybe_register_gateway_service("anywhere")
def test_unregister_noop_on_host(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.setattr(
"hermes_cli.service_manager.get_service_manager",
lambda: _HostManager(),
)
_maybe_unregister_gateway_service("hostprof")
def test_unregister_calls_through_on_s6(monkeypatch: pytest.MonkeyPatch) -> None:
mgr = _S6Manager()
monkeypatch.setattr(
"hermes_cli.service_manager.get_service_manager", lambda: mgr,
)
_maybe_unregister_gateway_service("coder")
assert mgr.unregistered == ["coder"]
def test_unregister_swallows_errors(
monkeypatch: pytest.MonkeyPatch, capsys: pytest.CaptureFixture[str],
) -> None:
mgr = _S6Manager()
mgr.raise_on_unregister = RuntimeError("svc gone weird")
monkeypatch.setattr(
"hermes_cli.service_manager.get_service_manager", lambda: mgr,
)
_maybe_unregister_gateway_service("coder")
captured = capsys.readouterr()
assert "Could not unregister" in captured.out