mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
Resolves the explicit "Known follow-up" left by commit2f8ceeab9and the resulting CI failures in tests/docker/test_dashboard.py and tests/docker/test_s6_profile_gateway_integration.py. The product gap --------------- Every hermes runtime operation inside the container runs as the hermes user (UID 10000) via s6-setuidgid. But s6-supervise — spawned by s6-svscan running as PID 1 — creates each service's supervise/ and top-level event/ directories with mode 0700 owned by its effective UID (root). That left every s6-svc / s6-svstat / s6-svwait call from hermes hitting EACCES on the supervise/control FIFO and supervise/status — i.e. the entire S6ServiceManager lifecycle (register, start, stop, unregister) was inert in production. The2f8ceeab9commit message called this out and deferred the fix. The audit changes that landed alongside it (defaulting docker_exec to -u hermes) made the integration tests reproduce the bug deterministically; the fix below resolves it. The fix: pre-create the supervise/ skeleton hermes-owned ---------------------------------------------------------- Reading s6's source (src/supervision/s6-supervise.c::trymkdir + control_init), the mkdir and mkfifo calls that build the supervise tree are EEXIST-safe: if the directory or FIFO is already present, s6-supervise reuses it and skips the chown/chmod fix-up that would normally make event/ 03730 root:root. So if we lay the skeleton down with hermes ownership before triggering s6-svscanctl -a, s6-supervise inherits our layout and never touches it. The death_tally / lock / status regular files written later by s6-supervise (still as root) land mode 0644 — world-readable — which is all s6-svstat needs. New module-level helper _seed_supervise_skeleton(svc_dir) in hermes_cli/service_manager.py lays down: svc_dir/event/ hermes:hermes 03730 svc_dir/supervise/ hermes:hermes 0755 svc_dir/supervise/event/ hermes:hermes 03730 svc_dir/supervise/control hermes:hermes 0660 (FIFO) svc_dir/log/event/ hermes:hermes 03730 (if log/ present) svc_dir/log/supervise/ hermes:hermes 0755 svc_dir/log/supervise/event/ hermes:hermes 03730 svc_dir/log/supervise/control hermes:hermes 0660 (FIFO) The log/ branch matters because the logger is a second s6-supervise instance — without it, unregister rmtree races on the logger's root-owned supervise dir even after the parent slot's supervise/ is hermes-owned. The helper is idempotent and swallows PermissionError on chown so it works equally well when called from root (cont-init.d) or hermes (runtime register). Wiring ------ 1. S6ServiceManager.register_profile_gateway calls _seed_supervise_skeleton(tmp_dir) just before publishing the slot via Path.replace. Runtime-registered profile gateways are set up by hermes. 2. container_boot._register_service does the same in the cont-init.d reconciliation path so boot-time-restored profile slots inherit the same layout. 3. New cont-init.d/015-supervise-perms script chowns the supervise/ and event/ trees for STATIC s6-rc services (dashboard, main-hermes). These are spawned by s6-rc before cont-init.d gets to run, so the EEXIST-trick doesn't apply; we chown the already-existing tree instead. s6-supervise keeps using the same files; it never re-asserts ownership on a running service. The script skips s6-overlay internal services (s6rc-*, s6-linux-*) so the supervision tree itself stays root-only. 015- slot is intentional: lex-sorts between 01-hermes-setup and 02-reconcile-profiles in the container's C-locale, so the chown finishes before the reconciler walks the scandir. Unregister teardown reordering ------------------------------ S6ServiceManager.unregister_profile_gateway now fires s6-svscanctl -an BEFORE rmtree (with a 200ms grace), so s6-svscan reaps the supervise child and releases its file handles on supervise/lock + supervise/status before we try to remove the directory. Previously rmtree raced s6-supervise on a set of files inside the supervise dir, and even with the parent supervise/ now hermes-owned, the contained files (death_tally, lock, status, written by root) could still be in use. Dashboard down-state redesign ----------------------------- The original PR #30136 review fix wrote a 'down' marker file into /run/service/dashboard/ via cont-init.d/03-dashboard-toggle. That approach was broken in two ways: (a) /run/service/dashboard is a symlink to a TRANSIENT /run/s6-rc:s6-rc-init:<tmpdir>/ directory while s6-rc is mid-transaction; the touch landed in a soon-to-be-discarded tmp. (b) Even when written to the final /run/s6-rc/servicedirs/ location, the 'down' file is only consulted by s6-supervise at slot startup. s6-rc's user-bundle explicitly transitions 'dashboard' to 'up' on every boot, overriding any down marker. The right fix is the canonical s6 pattern: when HERMES_DASHBOARD is unset, the dashboard run script exits 0 and a companion finish script exits 125. Per s6-supervise(8), exit code 125 from the finish script is the 'permanent failure, do not restart' marker — equivalent to s6-svc -O. The slot reports as 'down' to s6-svstat, matching the reality that no dashboard process is running. When HERMES_DASHBOARD IS truthy, finish exits 0 and restart-on-crash semantics apply. 03-dashboard-toggle is removed (its function is now subsumed by the run/finish pair). Tests ----- Adds four unit tests for _seed_supervise_skeleton covering the produced layout, the log/ subservice case, the skip-when-no-log case, and idempotency. The live-container verification continues to live in tests/docker/test_s6_profile_gateway_integration.py and tests/docker/test_dashboard.py — both now pass against the rebuilt image. References ---------- * Skarnet skaware mailing list 2020-02-02 (Laurent Bercot + Guillermo Diaz Hartusch) on unprivileged s6 tool semantics: http://skarnet.org/lists/skaware/1424.html * just-containers/s6-overlay#130 — same EEXIST-preseed pattern, community-validated 2016 onward * https://skarnet.org/software/s6/servicedir.html — exit-code 125 semantics in finish scripts
793 lines
28 KiB
Python
793 lines
28 KiB
Python
"""Tests for hermes_cli.service_manager — the abstract ServiceManager
|
|
protocol, the detect_service_manager() entry point, and the host-side
|
|
adapter wrappers (Systemd / Launchd / Windows).
|
|
|
|
The s6 backend is added in Phase 3; its tests live alongside the
|
|
implementation in this same file once that phase ships.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import pytest
|
|
|
|
from hermes_cli.service_manager import (
|
|
LaunchdServiceManager,
|
|
S6ServiceManager,
|
|
ServiceManager,
|
|
ServiceManagerKind,
|
|
SystemdServiceManager,
|
|
WindowsServiceManager,
|
|
detect_service_manager,
|
|
get_service_manager,
|
|
validate_profile_name,
|
|
)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# validate_profile_name
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def test_validate_profile_name_accepts_valid_names() -> None:
|
|
# Smoke: known-good names should not raise.
|
|
validate_profile_name("coder")
|
|
validate_profile_name("my-profile")
|
|
validate_profile_name("assistant_v2")
|
|
validate_profile_name("a")
|
|
validate_profile_name("0")
|
|
validate_profile_name("0abc")
|
|
|
|
|
|
@pytest.mark.parametrize(
|
|
"bad",
|
|
[
|
|
"", # empty
|
|
"Coder", # uppercase
|
|
"foo/bar", # path traversal
|
|
"../escape", # path traversal
|
|
"-leading-dash", # leading dash (s6 reads as a flag)
|
|
"_leading_underscore", # leading underscore
|
|
"name with spaces", # whitespace
|
|
"name.with.dots", # punctuation
|
|
"a" * 252, # too long
|
|
],
|
|
)
|
|
def test_validate_profile_name_rejects_invalid(bad: str) -> None:
|
|
with pytest.raises(ValueError):
|
|
validate_profile_name(bad)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# detect_service_manager
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def test_detect_service_manager_returns_known_value() -> None:
|
|
"""Without mocking, the function must still return one of the
|
|
advertised literals — anything else means a new platform branch
|
|
was added without updating ServiceManagerKind."""
|
|
result = detect_service_manager()
|
|
assert result in ("systemd", "launchd", "windows", "s6", "none")
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# _s6_running — must work for unprivileged users, not just root
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def _patch_s6_paths(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
*,
|
|
comm: str | OSError | None,
|
|
basedir_is_dir: bool,
|
|
) -> None:
|
|
"""Stub /proc/1/comm and /run/s6/basedir for _s6_running tests."""
|
|
from pathlib import Path as _Path
|
|
|
|
real_read_text = _Path.read_text
|
|
real_is_dir = _Path.is_dir
|
|
|
|
def fake_read_text(self, *args, **kwargs): # type: ignore[override]
|
|
if str(self) == "/proc/1/comm":
|
|
if isinstance(comm, OSError):
|
|
raise comm
|
|
if comm is None:
|
|
raise FileNotFoundError(2, "No such file or directory")
|
|
return comm + "\n"
|
|
return real_read_text(self, *args, **kwargs)
|
|
|
|
def fake_is_dir(self): # type: ignore[override]
|
|
if str(self) == "/run/s6/basedir":
|
|
return basedir_is_dir
|
|
return real_is_dir(self)
|
|
|
|
monkeypatch.setattr(_Path, "read_text", fake_read_text)
|
|
monkeypatch.setattr(_Path, "is_dir", fake_is_dir)
|
|
|
|
|
|
def test_s6_running_true_when_comm_and_basedir_match(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
from hermes_cli.service_manager import _s6_running
|
|
|
|
_patch_s6_paths(monkeypatch, comm="s6-svscan", basedir_is_dir=True)
|
|
assert _s6_running() is True
|
|
|
|
|
|
def test_s6_running_false_when_comm_is_wrong(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
from hermes_cli.service_manager import _s6_running
|
|
|
|
# systemd as PID 1, basedir present from some stray s6 install
|
|
_patch_s6_paths(monkeypatch, comm="systemd", basedir_is_dir=True)
|
|
assert _s6_running() is False
|
|
|
|
|
|
def test_s6_running_false_when_basedir_missing(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
from hermes_cli.service_manager import _s6_running
|
|
|
|
# The comm matches but the basedir is missing — e.g. an unrelated
|
|
# process happens to be named "s6-svscan"
|
|
_patch_s6_paths(monkeypatch, comm="s6-svscan", basedir_is_dir=False)
|
|
assert _s6_running() is False
|
|
|
|
|
|
def test_s6_running_false_when_comm_unreadable(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
"""Regression: /proc/1/exe was unreadable to UID 10000 and
|
|
resolve() silently returned the unresolved path, making detection
|
|
always-False inside the container under the hermes user. The new
|
|
probe must FAIL CLOSED — not raise — when /proc/1/comm can't be
|
|
read.
|
|
"""
|
|
from hermes_cli.service_manager import _s6_running
|
|
|
|
_patch_s6_paths(
|
|
monkeypatch,
|
|
comm=PermissionError(13, "Permission denied"),
|
|
basedir_is_dir=True,
|
|
)
|
|
assert _s6_running() is False
|
|
|
|
|
|
def test_s6_running_handles_missing_proc(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
"""On macOS / Windows / WSL-without-procfs, /proc/1/comm doesn't
|
|
exist. Must return False, not raise."""
|
|
from hermes_cli.service_manager import _s6_running
|
|
|
|
_patch_s6_paths(monkeypatch, comm=None, basedir_is_dir=False)
|
|
assert _s6_running() is False
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Backend wrappers — kind + registration unsupported on hosts
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def test_systemd_manager_kind_and_registration_unsupported() -> None:
|
|
mgr = SystemdServiceManager()
|
|
assert mgr.kind == "systemd"
|
|
assert mgr.supports_runtime_registration() is False
|
|
with pytest.raises(NotImplementedError):
|
|
mgr.register_profile_gateway("foo")
|
|
with pytest.raises(NotImplementedError):
|
|
mgr.unregister_profile_gateway("foo")
|
|
assert mgr.list_profile_gateways() == []
|
|
# Protocol conformance — runtime_checkable lets us assert this.
|
|
assert isinstance(mgr, ServiceManager)
|
|
|
|
|
|
def test_launchd_manager_kind_and_registration_unsupported() -> None:
|
|
mgr = LaunchdServiceManager()
|
|
assert mgr.kind == "launchd"
|
|
assert mgr.supports_runtime_registration() is False
|
|
with pytest.raises(NotImplementedError):
|
|
mgr.register_profile_gateway("foo")
|
|
assert mgr.list_profile_gateways() == []
|
|
assert isinstance(mgr, ServiceManager)
|
|
|
|
|
|
def test_windows_manager_kind_and_registration_unsupported() -> None:
|
|
mgr = WindowsServiceManager()
|
|
assert mgr.kind == "windows"
|
|
assert mgr.supports_runtime_registration() is False
|
|
with pytest.raises(NotImplementedError):
|
|
mgr.register_profile_gateway("foo")
|
|
assert isinstance(mgr, ServiceManager)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Lifecycle delegation — wrappers must call through to module-level fns
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def test_systemd_manager_lifecycle_delegates(monkeypatch: pytest.MonkeyPatch) -> None:
|
|
called: list[str] = []
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway.systemd_start", lambda: called.append("start"),
|
|
)
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway.systemd_stop", lambda: called.append("stop"),
|
|
)
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway.systemd_restart", lambda: called.append("restart"),
|
|
)
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway._probe_systemd_service_running",
|
|
lambda *a, **kw: (False, True),
|
|
)
|
|
mgr = SystemdServiceManager()
|
|
mgr.start("ignored")
|
|
mgr.stop("ignored")
|
|
mgr.restart("ignored")
|
|
assert called == ["start", "stop", "restart"]
|
|
assert mgr.is_running("ignored") is True
|
|
|
|
|
|
def test_launchd_manager_lifecycle_delegates(monkeypatch: pytest.MonkeyPatch) -> None:
|
|
called: list[str] = []
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway.launchd_start", lambda: called.append("start"),
|
|
)
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway.launchd_stop", lambda: called.append("stop"),
|
|
)
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway.launchd_restart", lambda: called.append("restart"),
|
|
)
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway._probe_launchd_service_running", lambda: False,
|
|
)
|
|
mgr = LaunchdServiceManager()
|
|
mgr.start("ignored")
|
|
mgr.stop("ignored")
|
|
mgr.restart("ignored")
|
|
assert called == ["start", "stop", "restart"]
|
|
assert mgr.is_running("ignored") is False
|
|
|
|
|
|
def test_windows_manager_lifecycle_delegates(monkeypatch: pytest.MonkeyPatch) -> None:
|
|
called: list[str] = []
|
|
# Force-import the submodule so monkeypatch's attribute lookup
|
|
# against the `hermes_cli` package succeeds — gateway_windows is
|
|
# imported lazily inside the wrapper and may not yet be loaded.
|
|
import hermes_cli.gateway_windows # noqa: F401
|
|
|
|
class _FakeWindowsModule:
|
|
@staticmethod
|
|
def start() -> None: called.append("start")
|
|
@staticmethod
|
|
def stop() -> None: called.append("stop")
|
|
@staticmethod
|
|
def restart() -> None: called.append("restart")
|
|
@staticmethod
|
|
def is_installed() -> bool: return True
|
|
|
|
monkeypatch.setattr("hermes_cli.gateway_windows", _FakeWindowsModule)
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway.find_gateway_pids",
|
|
lambda **kw: [12345],
|
|
)
|
|
mgr = WindowsServiceManager()
|
|
mgr.start("ignored")
|
|
mgr.stop("ignored")
|
|
mgr.restart("ignored")
|
|
assert called == ["start", "stop", "restart"]
|
|
assert mgr.is_running("ignored") is True
|
|
|
|
|
|
def test_windows_manager_is_running_false_when_not_installed(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
import hermes_cli.gateway_windows # noqa: F401
|
|
|
|
class _FakeWindowsModule:
|
|
@staticmethod
|
|
def is_installed() -> bool: return False
|
|
|
|
monkeypatch.setattr("hermes_cli.gateway_windows", _FakeWindowsModule)
|
|
monkeypatch.setattr(
|
|
"hermes_cli.gateway.find_gateway_pids",
|
|
lambda **kw: [12345], # PIDs would otherwise vote "running"
|
|
)
|
|
assert WindowsServiceManager().is_running("ignored") is False
|
|
|
|
|
|
def test_windows_manager_install_forwards_kwargs(monkeypatch: pytest.MonkeyPatch) -> None:
|
|
captured: dict[str, object] = {}
|
|
import hermes_cli.gateway_windows # noqa: F401
|
|
|
|
class _FakeWindowsModule:
|
|
@staticmethod
|
|
def install(*, force, start_now, start_on_login, elevated_handoff) -> None:
|
|
captured["force"] = force
|
|
captured["start_now"] = start_now
|
|
captured["start_on_login"] = start_on_login
|
|
captured["elevated_handoff"] = elevated_handoff
|
|
|
|
monkeypatch.setattr("hermes_cli.gateway_windows", _FakeWindowsModule)
|
|
WindowsServiceManager().install(
|
|
force=True, start_now=True, start_on_login=False, elevated_handoff=True,
|
|
)
|
|
assert captured == {
|
|
"force": True,
|
|
"start_now": True,
|
|
"start_on_login": False,
|
|
"elevated_handoff": True,
|
|
}
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# get_service_manager factory
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
@pytest.mark.parametrize(
|
|
"kind,cls",
|
|
[
|
|
("systemd", SystemdServiceManager),
|
|
("launchd", LaunchdServiceManager),
|
|
("windows", WindowsServiceManager),
|
|
],
|
|
)
|
|
def test_get_service_manager_returns_correct_backend(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
kind: ServiceManagerKind,
|
|
cls: type,
|
|
) -> None:
|
|
monkeypatch.setattr(
|
|
"hermes_cli.service_manager.detect_service_manager", lambda: kind,
|
|
)
|
|
assert isinstance(get_service_manager(), cls)
|
|
|
|
|
|
def test_get_service_manager_raises_when_unsupported(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
monkeypatch.setattr(
|
|
"hermes_cli.service_manager.detect_service_manager", lambda: "none",
|
|
)
|
|
with pytest.raises(RuntimeError, match="no supported service manager"):
|
|
get_service_manager()
|
|
|
|
|
|
def test_get_service_manager_returns_s6_instance(
|
|
monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
"""The s6 backend ships in Phase 3 — the factory must return an
|
|
S6ServiceManager when running inside a container."""
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
monkeypatch.setattr(
|
|
"hermes_cli.service_manager.detect_service_manager", lambda: "s6",
|
|
)
|
|
assert isinstance(get_service_manager(), S6ServiceManager)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# S6ServiceManager — unit tests against a tmp-path scandir (no real s6)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
@pytest.fixture
|
|
def s6_scandir(tmp_path):
|
|
"""Empty scandir for the S6ServiceManager tests."""
|
|
d = tmp_path / "service"
|
|
d.mkdir()
|
|
return d
|
|
|
|
|
|
@pytest.fixture
|
|
def fake_subprocess_run(monkeypatch: pytest.MonkeyPatch):
|
|
"""Capture subprocess.run calls + always return success. Lets the
|
|
S6ServiceManager tests run on hosts that don't have s6-svc /
|
|
s6-svscanctl installed.
|
|
|
|
Records are normalized: leading ``/command/`` is stripped from
|
|
cmd[0] so assertions can match on the bare s6-svc / s6-svstat /
|
|
s6-svscanctl name regardless of whether the manager calls them
|
|
via absolute path or bare name."""
|
|
calls: list[list[str]] = []
|
|
|
|
def _fake(cmd, **kw):
|
|
import subprocess as _sp
|
|
seq = list(cmd) if isinstance(cmd, (list, tuple)) else [str(cmd)]
|
|
if seq and seq[0].startswith("/command/"):
|
|
seq[0] = seq[0][len("/command/"):]
|
|
calls.append(seq)
|
|
return _sp.CompletedProcess(cmd, 0, "", "")
|
|
|
|
monkeypatch.setattr("subprocess.run", _fake)
|
|
return calls
|
|
|
|
|
|
def test_s6_manager_kind_and_supports_registration() -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
mgr = S6ServiceManager()
|
|
assert mgr.kind == "s6"
|
|
assert mgr.supports_runtime_registration() is True
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# _seed_supervise_skeleton — unit tests
|
|
# ---------------------------------------------------------------------------
|
|
#
|
|
# The skeleton helper pre-creates the dirs and FIFOs that s6-supervise
|
|
# would otherwise create as root mode 0700, locking out the
|
|
# unprivileged hermes user from every lifecycle op. These tests run
|
|
# against tmp_path and assert the produced layout — the live-container
|
|
# verification (against real s6-svc / s6-svstat) lives in
|
|
# tests/docker/test_s6_profile_gateway_integration.py.
|
|
|
|
|
|
def test_seed_supervise_skeleton_creates_expected_layout(tmp_path) -> None:
|
|
"""Verifies the dirs + FIFO + modes the helper lays down."""
|
|
import stat
|
|
|
|
from hermes_cli.service_manager import _seed_supervise_skeleton
|
|
|
|
svc_dir = tmp_path / "gateway-foo"
|
|
svc_dir.mkdir()
|
|
|
|
_seed_supervise_skeleton(svc_dir)
|
|
|
|
# Top-level event/ — s6-svlisten1 event subscription dir.
|
|
event = svc_dir / "event"
|
|
assert event.is_dir(), "missing top-level event/"
|
|
assert stat.S_IMODE(event.stat().st_mode) == 0o3730, (
|
|
f"event/ mode = {oct(event.stat().st_mode)}, want 03730"
|
|
)
|
|
|
|
# supervise/ dir.
|
|
supervise = svc_dir / "supervise"
|
|
assert supervise.is_dir(), "missing supervise/"
|
|
assert stat.S_IMODE(supervise.stat().st_mode) == 0o755
|
|
|
|
# supervise/event/.
|
|
supervise_event = supervise / "event"
|
|
assert supervise_event.is_dir(), "missing supervise/event/"
|
|
assert stat.S_IMODE(supervise_event.stat().st_mode) == 0o3730
|
|
|
|
# supervise/control FIFO.
|
|
control = supervise / "control"
|
|
assert control.exists(), "missing supervise/control FIFO"
|
|
assert stat.S_ISFIFO(control.stat().st_mode), (
|
|
"supervise/control must be a FIFO"
|
|
)
|
|
assert stat.S_IMODE(control.stat().st_mode) == 0o660
|
|
|
|
|
|
def test_seed_supervise_skeleton_handles_log_subservice(tmp_path) -> None:
|
|
"""When a log/ subdir exists, its supervise tree also gets seeded.
|
|
|
|
Without this, ``unregister_profile_gateway``'s rmtree would EACCES
|
|
on the logger's root-owned supervise dir even after the parent
|
|
slot's supervise/ was hermes-owned.
|
|
"""
|
|
import stat
|
|
|
|
from hermes_cli.service_manager import _seed_supervise_skeleton
|
|
|
|
svc_dir = tmp_path / "gateway-foo"
|
|
svc_dir.mkdir()
|
|
(svc_dir / "log").mkdir() # logger subdir present
|
|
|
|
_seed_supervise_skeleton(svc_dir)
|
|
|
|
# Logger's own supervise tree is seeded the same way.
|
|
log_event = svc_dir / "log" / "event"
|
|
log_supervise = svc_dir / "log" / "supervise"
|
|
log_supervise_event = log_supervise / "event"
|
|
log_control = log_supervise / "control"
|
|
|
|
assert log_event.is_dir()
|
|
assert stat.S_IMODE(log_event.stat().st_mode) == 0o3730
|
|
assert log_supervise.is_dir()
|
|
assert log_supervise_event.is_dir()
|
|
assert log_control.exists() and stat.S_ISFIFO(log_control.stat().st_mode)
|
|
|
|
|
|
def test_seed_supervise_skeleton_skips_when_no_log_subservice(tmp_path) -> None:
|
|
"""If log/ isn't present, no logger skeleton is created."""
|
|
from hermes_cli.service_manager import _seed_supervise_skeleton
|
|
|
|
svc_dir = tmp_path / "gateway-foo"
|
|
svc_dir.mkdir()
|
|
|
|
_seed_supervise_skeleton(svc_dir)
|
|
|
|
assert not (svc_dir / "log").exists(), (
|
|
"helper must not synthesize a log/ subdir on its own"
|
|
)
|
|
|
|
|
|
def test_seed_supervise_skeleton_is_idempotent(tmp_path) -> None:
|
|
"""Calling the helper twice on the same dir is a no-op the second time.
|
|
|
|
Important because s6-supervise may have already opened the FIFO
|
|
when a re-register / reconcile happens; double-creation would
|
|
error out. The helper short-circuits on existence.
|
|
"""
|
|
from hermes_cli.service_manager import _seed_supervise_skeleton
|
|
|
|
svc_dir = tmp_path / "gateway-foo"
|
|
svc_dir.mkdir()
|
|
|
|
_seed_supervise_skeleton(svc_dir)
|
|
_seed_supervise_skeleton(svc_dir) # must not raise
|
|
|
|
|
|
def test_s6_register_creates_service_dir_and_triggers_scan(
|
|
s6_scandir, fake_subprocess_run,
|
|
) -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
mgr.register_profile_gateway("coder")
|
|
|
|
svc_dir = s6_scandir / "gateway-coder"
|
|
assert svc_dir.is_dir()
|
|
assert (svc_dir / "type").read_text().strip() == "longrun"
|
|
|
|
run_path = svc_dir / "run"
|
|
assert run_path.is_file()
|
|
assert run_path.stat().st_mode & 0o111 # executable
|
|
run_text = run_path.read_text()
|
|
assert "hermes -p coder gateway run" in run_text
|
|
assert "s6-setuidgid hermes" in run_text
|
|
|
|
log_run = svc_dir / "log" / "run"
|
|
assert log_run.is_file()
|
|
log_text = log_run.read_text()
|
|
# CRITICAL: HERMES_HOME must be a runtime env-var expansion, NOT
|
|
# a Python-substituted absolute path. Negative-assert the wrong
|
|
# form so future regressions are caught.
|
|
assert "$HERMES_HOME" in log_text
|
|
assert "logs/gateways/coder" in log_text
|
|
assert "/opt/data/logs/gateways/coder" not in log_text, (
|
|
"log_dir was hard-coded; must use ${HERMES_HOME} at run time"
|
|
)
|
|
|
|
# s6-svscanctl -a was invoked against the scandir
|
|
assert any(
|
|
cmd[0] == "s6-svscanctl" and "-a" in cmd
|
|
and str(s6_scandir) in cmd
|
|
for cmd in fake_subprocess_run
|
|
), f"s6-svscanctl -a not invoked; saw: {fake_subprocess_run}"
|
|
|
|
|
|
def test_s6_register_extra_env_is_quoted(s6_scandir, fake_subprocess_run) -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
mgr.register_profile_gateway(
|
|
"x", extra_env={"FOO": "bar baz", "QUOTED": "a'b"},
|
|
)
|
|
run_text = (s6_scandir / "gateway-x" / "run").read_text()
|
|
# shlex.quote should have wrapped both values
|
|
assert "export FOO='bar baz'" in run_text
|
|
assert "export QUOTED='a'\"'\"'b'" in run_text
|
|
|
|
|
|
def test_s6_register_rejects_invalid_profile_name(s6_scandir) -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
with pytest.raises(ValueError):
|
|
mgr.register_profile_gateway("Bad/Name")
|
|
|
|
|
|
def test_s6_register_rejects_duplicate(s6_scandir, fake_subprocess_run) -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
(s6_scandir / "gateway-coder").mkdir(parents=True)
|
|
with pytest.raises(ValueError, match="already registered"):
|
|
mgr.register_profile_gateway("coder")
|
|
|
|
|
|
def test_s6_register_rolls_back_on_svscanctl_failure(
|
|
s6_scandir, monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
"""If s6-svscanctl fails the service dir must be cleaned up so the
|
|
next register call doesn't see a stale duplicate."""
|
|
import subprocess as _sp
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
|
|
def _fail_scanctl(cmd, **kw):
|
|
# Manager calls s6-svscanctl by absolute path; match on basename.
|
|
if cmd[0].endswith("/s6-svscanctl"):
|
|
return _sp.CompletedProcess(cmd, 1, "", "rescan failed")
|
|
return _sp.CompletedProcess(cmd, 0, "", "")
|
|
monkeypatch.setattr("subprocess.run", _fail_scanctl)
|
|
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
with pytest.raises(RuntimeError, match="s6-svscanctl failed"):
|
|
mgr.register_profile_gateway("coder")
|
|
assert not (s6_scandir / "gateway-coder").exists()
|
|
|
|
|
|
def test_s6_unregister_removes_service_dir(
|
|
s6_scandir, fake_subprocess_run,
|
|
) -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
svc_dir = s6_scandir / "gateway-coder"
|
|
svc_dir.mkdir(parents=True)
|
|
(svc_dir / "type").write_text("longrun\n")
|
|
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
mgr.unregister_profile_gateway("coder")
|
|
|
|
# s6-svc -d was issued
|
|
assert any(
|
|
cmd[0] == "s6-svc" and "-d" in cmd
|
|
for cmd in fake_subprocess_run
|
|
)
|
|
# Service dir was removed
|
|
assert not svc_dir.exists()
|
|
# Rescan was triggered
|
|
assert any(cmd[0] == "s6-svscanctl" for cmd in fake_subprocess_run)
|
|
|
|
|
|
def test_s6_unregister_absent_profile_is_noop(s6_scandir) -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
# Should NOT raise even though "ghost" doesn't exist
|
|
S6ServiceManager(scandir=s6_scandir).unregister_profile_gateway("ghost")
|
|
|
|
|
|
def test_s6_list_profile_gateways(s6_scandir) -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
# Three gateway profiles + one unrelated service + one hidden dir
|
|
(s6_scandir / "gateway-coder").mkdir()
|
|
(s6_scandir / "gateway-assistant").mkdir()
|
|
(s6_scandir / "gateway-writer").mkdir()
|
|
(s6_scandir / "s6-linux-init-shutdownd").mkdir() # filtered out
|
|
(s6_scandir / ".lock").mkdir() # filtered out (hidden)
|
|
|
|
profiles = sorted(S6ServiceManager(scandir=s6_scandir).list_profile_gateways())
|
|
assert profiles == ["assistant", "coder", "writer"]
|
|
|
|
|
|
def test_s6_list_profile_gateways_empty_when_scandir_missing(tmp_path) -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
missing = tmp_path / "does-not-exist"
|
|
assert S6ServiceManager(scandir=missing).list_profile_gateways() == []
|
|
|
|
|
|
def test_s6_lifecycle_dispatches_to_s6_svc(
|
|
s6_scandir, fake_subprocess_run,
|
|
) -> None:
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
# _run_svc now verifies the slot exists before invoking s6-svc, so
|
|
# we have to pre-seed the dir. In real use the slot is created by
|
|
# register_profile_gateway or the cont-init.d reconciler.
|
|
(s6_scandir / "gateway-coder").mkdir()
|
|
mgr.start("gateway-coder")
|
|
mgr.stop("gateway-coder")
|
|
mgr.restart("gateway-coder")
|
|
|
|
flags = [c[1] for c in fake_subprocess_run if c[0] == "s6-svc"]
|
|
assert flags == ["-u", "-d", "-t"]
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Lifecycle errors — friendly messages, not raw CalledProcessError
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def test_lifecycle_raises_gateway_not_registered_for_missing_slot(
|
|
s6_scandir, fake_subprocess_run,
|
|
) -> None:
|
|
"""When the service slot doesn't exist, the lifecycle methods
|
|
must raise GatewayNotRegisteredError BEFORE invoking s6-svc, so
|
|
the user sees a clear 'no such gateway' message instead of an
|
|
opaque CalledProcessError stacktrace."""
|
|
from hermes_cli.service_manager import (
|
|
GatewayNotRegisteredError,
|
|
S6ServiceManager,
|
|
)
|
|
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
# No gateway-typo/ directory exists — slot is missing.
|
|
with pytest.raises(GatewayNotRegisteredError) as excinfo:
|
|
mgr.start("gateway-typo")
|
|
assert excinfo.value.profile == "typo"
|
|
assert excinfo.value.service == "gateway-typo"
|
|
msg = str(excinfo.value)
|
|
assert "'typo'" in msg
|
|
assert "hermes profile create typo" in msg
|
|
# And critically: s6-svc was NOT invoked.
|
|
assert not any(c[0] == "s6-svc" for c in fake_subprocess_run)
|
|
|
|
|
|
@pytest.mark.parametrize("action,method_name", [
|
|
("start", "start"),
|
|
("stop", "stop"),
|
|
("restart", "restart"),
|
|
])
|
|
def test_all_lifecycle_methods_check_for_missing_slot(
|
|
s6_scandir,
|
|
fake_subprocess_run,
|
|
action: str,
|
|
method_name: str,
|
|
) -> None:
|
|
"""start/stop/restart all check for missing slots the same way."""
|
|
from hermes_cli.service_manager import (
|
|
GatewayNotRegisteredError,
|
|
S6ServiceManager,
|
|
)
|
|
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
with pytest.raises(GatewayNotRegisteredError):
|
|
getattr(mgr, method_name)("gateway-absent")
|
|
|
|
|
|
def test_gateway_not_registered_unprefixed_service_name(s6_scandir) -> None:
|
|
"""If the caller passes a name without the 'gateway-' prefix (the
|
|
Protocol allows arbitrary service names), the error still carries
|
|
that name verbatim as the 'profile' so error messages don't
|
|
accidentally strip user-provided text."""
|
|
from hermes_cli.service_manager import (
|
|
GatewayNotRegisteredError,
|
|
S6ServiceManager,
|
|
)
|
|
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
with pytest.raises(GatewayNotRegisteredError) as excinfo:
|
|
mgr.start("not-prefixed")
|
|
assert excinfo.value.profile == "not-prefixed"
|
|
|
|
|
|
def test_lifecycle_raises_s6_command_error_on_subprocess_failure(
|
|
s6_scandir, monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
"""When s6-svc itself fails (non-zero exit) — e.g. EACCES on the
|
|
supervise control FIFO — the lifecycle methods translate the
|
|
CalledProcessError into a named S6CommandError carrying the
|
|
return code and stderr."""
|
|
import subprocess as _sp
|
|
from hermes_cli.service_manager import S6CommandError, S6ServiceManager
|
|
|
|
# Pre-create the slot so we reach the s6-svc call.
|
|
(s6_scandir / "gateway-coder").mkdir()
|
|
|
|
def _fail(cmd, **kw):
|
|
raise _sp.CalledProcessError(
|
|
returncode=111,
|
|
cmd=cmd,
|
|
stderr="s6-svc: fatal: unable to control supervise/control: "
|
|
"Permission denied\n",
|
|
)
|
|
monkeypatch.setattr("subprocess.run", _fail)
|
|
|
|
mgr = S6ServiceManager(scandir=s6_scandir)
|
|
with pytest.raises(S6CommandError) as excinfo:
|
|
mgr.start("gateway-coder")
|
|
assert excinfo.value.service == "gateway-coder"
|
|
assert excinfo.value.action == "start"
|
|
assert excinfo.value.returncode == 111
|
|
assert "Permission denied" in excinfo.value.stderr
|
|
assert "Permission denied" in str(excinfo.value)
|
|
assert "rc=111" in str(excinfo.value)
|
|
|
|
|
|
def test_s6_is_running_parses_svstat(
|
|
s6_scandir, monkeypatch: pytest.MonkeyPatch,
|
|
) -> None:
|
|
import subprocess as _sp
|
|
from hermes_cli.service_manager import S6ServiceManager
|
|
|
|
def _svstat(cmd, **kw):
|
|
if cmd[0].endswith("/s6-svstat"):
|
|
return _sp.CompletedProcess(cmd, 0, "up (pid 42) 17 seconds\n", "")
|
|
return _sp.CompletedProcess(cmd, 0, "", "")
|
|
monkeypatch.setattr("subprocess.run", _svstat)
|
|
assert S6ServiceManager(scandir=s6_scandir).is_running("gateway-coder") is True
|
|
|
|
def _svstat_down(cmd, **kw):
|
|
if cmd[0].endswith("/s6-svstat"):
|
|
return _sp.CompletedProcess(cmd, 0, "down 5 seconds\n", "")
|
|
return _sp.CompletedProcess(cmd, 0, "", "")
|
|
monkeypatch.setattr("subprocess.run", _svstat_down)
|
|
assert S6ServiceManager(scandir=s6_scandir).is_running("gateway-coder") is False
|