mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-10 08:32:09 +00:00
* fix(gateway,windows): reliability — supervisor task, JOB breakaway, status --deep
Three coordinated fixes for the Windows gateway reliability story:
1. CREATE_BREAKAWAY_FROM_JOB on every detached spawn
The 'hermes update' triggered from the Electron Desktop GUI ran inside
Electron's job object. Without breakaway, the post-update gateway
watcher spawned by update — already DETACHED_PROCESS — was still
reaped when Electron's job tore down, so the gateway never came back
after a GUI-initiated update. Adds CREATE_BREAKAWAY_FROM_JOB (0x01000000)
to:
- hermes_cli/_subprocess_compat.py::windows_detach_flags() — used by
every helper that calls windows_detach_popen_kwargs(), including
launch_detached_profile_gateway_restart()
- The watcher subprocess's own respawn snippet in
hermes_cli/gateway.py (inlined flags so the watcher's child
respawn also breaks away)
_spawn_detached() in gateway_windows.py already had the flag; this
change brings the rest of the codebase to parity.
2. Per-minute supervisor Scheduled Task — Windows equivalent of
systemd Restart=always
Introduces hermes_cli/gateway_supervisor.py and registers it as a
second Scheduled Task ('Hermes_Gateway_Supervisor', SC MINUTE /MO 1,
LIMITED rights) alongside the existing ONLOGON task. Every minute,
the supervisor uses the same gateway.status.get_running_pid() probe
as 'hermes gateway status' and, if no gateway is alive, calls
gateway_windows._spawn_detached() (which now includes BREAKAWAY) to
bring one back.
Covers every crash mode, not just 'machine rebooted': taskkill,
OOM, GUI update SIGTERM, parent job teardown. Cheap — one pythonw
startup per minute when down, one PID-existence check per minute
when up.
Wired into both the schtasks-success and Startup-folder-fallback
install paths via _install_supervisor_best_effort(), and removed in
uninstall(). Best-effort: a failing supervisor install logs a
warning but doesn't roll back the primary install.
3. 'hermes gateway status --deep' shows per-probe PASS/FAIL
Replaces the existing terse '--deep' output (which only printed
paths) with an actual diagnostic table:
[1] PID file present
[2] Lock file held by a live process
[3] get_running_pid() result
[4] _pid_exists(pid) — OS-level liveness
[5] gateway_state.json (state + age)
[6] Last lifecycle event from gateway-exit-diag.log
When the high-level summary disagrees with reality, the user can
see exactly which signal is lying.
Test-leak fix
-------------
tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages
monkey-patched is_linux/is_wsl/supports_systemd_services to simulate
WSL but did NOT stub is_windows(). On a Windows host, the dispatcher
in _gateway_command_inner takes the is_windows() branch BEFORE the
WSL guidance branch, so the test invoked gateway_windows.install()
for real. install() writes to %APPDATA%\...\Startup\Hermes_Gateway.cmd
— the REAL user Startup folder, never sandboxed by tmp_path — pointing
at the test's pytest-of-<user>/pytest-<N>/.../gateway-service/ wrapper.
When pytest tore down the tmp_path, every subsequent Windows login
flashed a cmd.exe window that failed to find the missing target.
Stubs is_windows=False on all four affected tests:
test_install_wsl_no_systemd
test_start_wsl_no_systemd
test_status_wsl_running_manual
test_status_wsl_not_running
Defense-in-depth: _build_startup_launcher() now prefixes the launcher
with 'if not exist <target> exit /b 0', so any future stale Startup
entry silently no-ops instead of flashing a console window.
Status enhancements
-------------------
- status() now reports supervisor task presence alongside the existing
schtasks/Startup info, and nudges the user to reinstall if the
supervisor isn't registered.
- Deep mode dumps both the supervisor task name + script path.
* fix(gateway,windows): drop the per-minute supervisor task — keep breakaway + deep probes
Earlier in this branch we added a per-minute schtasks-based supervisor to
respawn the gateway after crashes / GUI-update SIGTERMs. The implementation
flashed a brief console window on every firing, which stole window focus.
We tried several variants:
- cmd.exe wrapper invoking pythonw -> flashes (cmd.exe is console-subsystem)
- schtasks /TR pointing at pythonw -> flashes (uv venv launcher pythonw is
actually subsystem=Console, not GUI; it respawns the real pythonw)
- schtasks /TR pointing at base uv -> still flashes (Task Scheduler-side
conhost preallocation; documented Windows quirk)
- XML registration with <Hidden>true> -> still flashes (<Hidden> only hides
the task in the Task Scheduler UI, not the spawned window)
Researched what leading projects do:
- Ollama: GUI-subsystem tray exe + Startup-folder shortcut. No supervisor.
- Tailscale: real Windows Service via SCM. Session 0, no console possible.
- Syncthing: --no-console flag inside the binary + Startup folder.
- openclaw: VBS Run(..., 0, False) wrapper. Suppresses the *window* but
Super User Q971162 confirms focus-steal still occurs in some cases.
None of these use a per-minute polling scheduled task. The 'auto-restart on
crash' responsibility belongs INSIDE the daemon (Tailscale's in-process
recovery / Ollama's monitor+worker pair) OR is delegated to the Windows
Service Control Manager — not Task Scheduler.
So this commit drops the supervisor entirely. The CREATE_BREAKAWAY_FROM_JOB
fix in _subprocess_compat.py (from commit c1e5fa433) survives — that is the
*real* fix for problem #2 (GUI-update kills gateway): the post-update
watcher in launch_detached_profile_gateway_restart() now breaks out of
Electron's job object, so the gateway respawn watcher survives the GUI
quit and successfully respawns the gateway.
Surviving from c1e5fa433:
* CREATE_BREAKAWAY_FROM_JOB in hermes_cli/_subprocess_compat.py (fixes #2)
* Inlined breakaway flag in the watcher respawn snippet in gateway.py
* hermes gateway status --deep PASS/FAIL probes (fixes #1 — visibility)
* 'if not exist <target> exit /b 0' guard in _build_startup_launcher
(fixes #3 — silent no-op for stale Startup entries)
* tests/hermes_cli/test_gateway_wsl.py is_windows=False stubs (root cause
of #3 — pytest WSL tests no longer leak Startup entries on Win hosts)
Removed in this commit:
* hermes_cli/gateway_supervisor.py (entire file)
* Supervisor section in hermes_cli/gateway_windows.py (~180 lines):
get_supervisor_task_name, get_supervisor_script_path,
_build_supervisor_cmd_script, _write_supervisor_script,
_install_supervisor_task, is_supervisor_task_registered,
_install_supervisor_best_effort
* _install_supervisor_best_effort() calls in install() (3 spots)
* supervisor cleanup block in uninstall()
* supervisor display lines in status() / status(deep=True)
Future direction (out of scope for this PR): the right place for Windows
'Restart=always' semantics is a real Windows Service installed via
pywin32's win32serviceutil.ServiceFramework — session-0 isolation, SCM
auto-restart, no console window possible. That's a meaningful next-PR
project, not a band-aid.
Tests: 51 pass / 2 pre-existing failures in
tests/hermes_cli/test_gateway_{windows,wsl}.py (the 2 failures are
TestSupportsSystemdServicesWSL cases that fail on origin/main too —
unrelated to this PR).
267 lines
11 KiB
Python
267 lines
11 KiB
Python
"""Tests for WSL detection and WSL-aware gateway behavior."""
|
|
|
|
import subprocess
|
|
from types import SimpleNamespace
|
|
from unittest.mock import patch, MagicMock, mock_open
|
|
|
|
import pytest
|
|
|
|
import hermes_cli.gateway as gateway
|
|
import hermes_constants
|
|
|
|
|
|
# =============================================================================
|
|
# is_wsl() in hermes_constants
|
|
# =============================================================================
|
|
|
|
class TestIsWsl:
|
|
"""Test the shared is_wsl() utility."""
|
|
|
|
def setup_method(self):
|
|
# Reset cached value between tests
|
|
hermes_constants._wsl_detected = None
|
|
|
|
def test_detects_wsl2(self):
|
|
fake_content = (
|
|
"Linux version 5.15.146.1-microsoft-standard-WSL2 "
|
|
"(gcc (GCC) 11.2.0) #1 SMP Thu Jan 11 04:09:03 UTC 2024\n"
|
|
)
|
|
with patch("builtins.open", mock_open(read_data=fake_content)):
|
|
assert hermes_constants.is_wsl() is True
|
|
|
|
def test_detects_wsl1(self):
|
|
fake_content = (
|
|
"Linux version 4.4.0-19041-Microsoft "
|
|
"(Microsoft@Microsoft.com) (gcc version 5.4.0) #1\n"
|
|
)
|
|
with patch("builtins.open", mock_open(read_data=fake_content)):
|
|
assert hermes_constants.is_wsl() is True
|
|
|
|
def test_native_linux(self):
|
|
fake_content = (
|
|
"Linux version 6.5.0-44-generic (buildd@lcy02-amd64-015) "
|
|
"(x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0) #44\n"
|
|
)
|
|
with patch("builtins.open", mock_open(read_data=fake_content)):
|
|
assert hermes_constants.is_wsl() is False
|
|
|
|
def test_no_proc_version(self):
|
|
with patch("builtins.open", side_effect=FileNotFoundError):
|
|
assert hermes_constants.is_wsl() is False
|
|
|
|
def test_result_is_cached(self):
|
|
"""After first detection, subsequent calls return the cached value."""
|
|
hermes_constants._wsl_detected = True
|
|
# Even with open raising, cached value is returned
|
|
with patch("builtins.open", side_effect=FileNotFoundError):
|
|
assert hermes_constants.is_wsl() is True
|
|
|
|
|
|
# =============================================================================
|
|
# _wsl_systemd_operational() in gateway
|
|
# =============================================================================
|
|
|
|
class TestWslSystemdOperational:
|
|
"""Test the WSL systemd check."""
|
|
|
|
def test_running(self, monkeypatch):
|
|
monkeypatch.setattr(
|
|
gateway.subprocess, "run",
|
|
lambda *a, **kw: SimpleNamespace(
|
|
returncode=0, stdout="running\n", stderr=""
|
|
),
|
|
)
|
|
assert gateway._wsl_systemd_operational() is True
|
|
|
|
def test_degraded(self, monkeypatch):
|
|
monkeypatch.setattr(
|
|
gateway.subprocess, "run",
|
|
lambda *a, **kw: SimpleNamespace(
|
|
returncode=1, stdout="degraded\n", stderr=""
|
|
),
|
|
)
|
|
assert gateway._wsl_systemd_operational() is True
|
|
|
|
def test_starting(self, monkeypatch):
|
|
monkeypatch.setattr(
|
|
gateway.subprocess, "run",
|
|
lambda *a, **kw: SimpleNamespace(
|
|
returncode=1, stdout="starting\n", stderr=""
|
|
),
|
|
)
|
|
assert gateway._wsl_systemd_operational() is True
|
|
|
|
def test_offline_no_systemd(self, monkeypatch):
|
|
monkeypatch.setattr(
|
|
gateway.subprocess, "run",
|
|
lambda *a, **kw: SimpleNamespace(
|
|
returncode=1, stdout="offline\n", stderr=""
|
|
),
|
|
)
|
|
assert gateway._wsl_systemd_operational() is False
|
|
|
|
def test_systemctl_not_found(self, monkeypatch):
|
|
monkeypatch.setattr(
|
|
gateway.subprocess, "run",
|
|
MagicMock(side_effect=FileNotFoundError),
|
|
)
|
|
assert gateway._wsl_systemd_operational() is False
|
|
|
|
def test_timeout(self, monkeypatch):
|
|
monkeypatch.setattr(
|
|
gateway.subprocess, "run",
|
|
MagicMock(side_effect=subprocess.TimeoutExpired("systemctl", 5)),
|
|
)
|
|
assert gateway._wsl_systemd_operational() is False
|
|
|
|
|
|
# =============================================================================
|
|
# supports_systemd_services() WSL integration
|
|
# =============================================================================
|
|
|
|
class TestSupportsSystemdServicesWSL:
|
|
"""Test that supports_systemd_services() handles WSL correctly."""
|
|
|
|
def test_wsl_with_systemd(self, monkeypatch):
|
|
"""WSL + working systemd → True."""
|
|
monkeypatch.setattr(gateway, "is_linux", lambda: True)
|
|
monkeypatch.setattr(gateway, "is_termux", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_wsl", lambda: True)
|
|
monkeypatch.setattr(gateway, "_wsl_systemd_operational", lambda: True)
|
|
assert gateway.supports_systemd_services() is True
|
|
|
|
def test_wsl_without_systemd(self, monkeypatch):
|
|
"""WSL + no systemd → False."""
|
|
monkeypatch.setattr(gateway, "is_linux", lambda: True)
|
|
monkeypatch.setattr(gateway, "is_termux", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_wsl", lambda: True)
|
|
monkeypatch.setattr(gateway, "_wsl_systemd_operational", lambda: False)
|
|
assert gateway.supports_systemd_services() is False
|
|
|
|
def test_native_linux(self, monkeypatch):
|
|
"""Native Linux (not WSL) → True without checking systemd."""
|
|
monkeypatch.setattr(gateway, "is_linux", lambda: True)
|
|
monkeypatch.setattr(gateway, "is_termux", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_wsl", lambda: False)
|
|
assert gateway.supports_systemd_services() is True
|
|
|
|
def test_termux_still_excluded(self, monkeypatch):
|
|
"""Termux → False regardless of WSL status."""
|
|
monkeypatch.setattr(gateway, "is_linux", lambda: True)
|
|
monkeypatch.setattr(gateway, "is_termux", lambda: True)
|
|
assert gateway.supports_systemd_services() is False
|
|
|
|
|
|
# =============================================================================
|
|
# WSL messaging in gateway commands
|
|
# =============================================================================
|
|
|
|
class TestGatewayCommandWSLMessages:
|
|
"""Test that WSL users see appropriate guidance."""
|
|
|
|
def test_install_wsl_no_systemd(self, monkeypatch, capsys):
|
|
"""hermes gateway install on WSL without systemd shows guidance."""
|
|
monkeypatch.setattr(gateway, "is_linux", lambda: True)
|
|
monkeypatch.setattr(gateway, "is_termux", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_wsl", lambda: True)
|
|
monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_macos", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_managed", lambda: False)
|
|
# CRITICAL: also stub is_windows. Without this, running this test on a
|
|
# real Windows host falls through to the is_windows() branch *before*
|
|
# the WSL guidance branch, invoking gateway_windows.install() which
|
|
# writes a Startup-folder .cmd into the real user's Startup folder
|
|
# (NOT tmp_path) pointing at a now-vanished pytest fixture path.
|
|
# The user then sees a broken Hermes_Gateway.cmd flash a cmd.exe
|
|
# window on every login. See fix/windows-gateway-reliability.
|
|
monkeypatch.setattr(gateway, "is_windows", lambda: False)
|
|
|
|
args = SimpleNamespace(
|
|
gateway_command="install", force=False, system=False,
|
|
run_as_user=None,
|
|
)
|
|
with pytest.raises(SystemExit) as exc_info:
|
|
gateway.gateway_command(args)
|
|
assert exc_info.value.code == 1
|
|
|
|
out = capsys.readouterr().out
|
|
assert "WSL detected" in out
|
|
assert "systemd is not running" in out
|
|
assert "hermes gateway run" in out
|
|
assert "tmux" in out
|
|
|
|
def test_start_wsl_no_systemd(self, monkeypatch, capsys):
|
|
"""hermes gateway start on WSL without systemd shows guidance."""
|
|
monkeypatch.setattr(gateway, "is_linux", lambda: True)
|
|
monkeypatch.setattr(gateway, "is_termux", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_wsl", lambda: True)
|
|
monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_macos", lambda: False)
|
|
# See test_install_wsl_no_systemd: stub is_windows so a Windows host
|
|
# running this test does NOT actually spawn a detached gateway via
|
|
# gateway_windows.start().
|
|
monkeypatch.setattr(gateway, "is_windows", lambda: False)
|
|
|
|
args = SimpleNamespace(gateway_command="start", system=False)
|
|
with pytest.raises(SystemExit) as exc_info:
|
|
gateway.gateway_command(args)
|
|
assert exc_info.value.code == 1
|
|
|
|
out = capsys.readouterr().out
|
|
assert "WSL detected" in out
|
|
assert "hermes gateway run" in out
|
|
assert "wsl.conf" in out
|
|
|
|
def test_status_wsl_running_manual(self, monkeypatch, capsys):
|
|
"""hermes gateway status on WSL with manual process shows WSL note."""
|
|
monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_macos", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_termux", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_wsl", lambda: True)
|
|
# Stub is_windows so a Windows host running this test does NOT take
|
|
# the Windows status branch (which reads gateway_windows.is_installed()).
|
|
monkeypatch.setattr(gateway, "is_windows", lambda: False)
|
|
monkeypatch.setattr(gateway, "find_gateway_pids", lambda: [12345])
|
|
monkeypatch.setattr(gateway, "_runtime_health_lines", lambda: [])
|
|
# Stub out the systemd unit path check
|
|
monkeypatch.setattr(
|
|
gateway, "get_systemd_unit_path",
|
|
lambda system=False: SimpleNamespace(exists=lambda: False),
|
|
)
|
|
monkeypatch.setattr(
|
|
gateway, "get_launchd_plist_path",
|
|
lambda: SimpleNamespace(exists=lambda: False),
|
|
)
|
|
|
|
args = SimpleNamespace(gateway_command="status", deep=False, system=False)
|
|
gateway.gateway_command(args)
|
|
|
|
out = capsys.readouterr().out
|
|
assert "WSL note" in out
|
|
assert "tmux or screen" in out
|
|
|
|
def test_status_wsl_not_running(self, monkeypatch, capsys):
|
|
"""hermes gateway status on WSL with no process shows WSL start advice."""
|
|
monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_macos", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_termux", lambda: False)
|
|
monkeypatch.setattr(gateway, "is_wsl", lambda: True)
|
|
# See test_status_wsl_running_manual.
|
|
monkeypatch.setattr(gateway, "is_windows", lambda: False)
|
|
monkeypatch.setattr(gateway, "find_gateway_pids", lambda: [])
|
|
monkeypatch.setattr(gateway, "_runtime_health_lines", lambda: [])
|
|
monkeypatch.setattr(
|
|
gateway, "get_systemd_unit_path",
|
|
lambda system=False: SimpleNamespace(exists=lambda: False),
|
|
)
|
|
monkeypatch.setattr(
|
|
gateway, "get_launchd_plist_path",
|
|
lambda: SimpleNamespace(exists=lambda: False),
|
|
)
|
|
|
|
args = SimpleNamespace(gateway_command="status", deep=False, system=False)
|
|
gateway.gateway_command(args)
|
|
|
|
out = capsys.readouterr().out
|
|
assert "hermes gateway run" in out
|
|
assert "tmux" in out
|