hermes-agent/tests/gateway/test_gateway_command_line_matcher.py
Charles Power fd92a3a5c9 fix(gateway): Windows restart no longer causes a silent outage
`hermes gateway restart` on Windows could take the gateway offline with no
replacement. restart() was stop() -> sleep(1.0) -> start(), but the graceful
drain can run up to ~180s while the detached pythonw process stays alive. The
1s sleep let start() run against the still-draining old process; its
"already running" guard then no-opped, and when the old process finally exited
nothing relaunched it.

Two root causes, both fixed:

1. Loose PID detection. `_scan_gateway_pids` and the gateway.status helpers
   used substring matches ("... gateway" in cmdline) for lifecycle decisions,
   so they false-matched `gateway status`/`dashboard` siblings and unrelated
   processes like `python -m tui_gateway`, plus stale gateway.pid records.
   Add a shared strict matcher `looks_like_gateway_command_line()` in
   gateway/status.py that requires the real `gateway run` subcommand (or the
   dedicated entrypoints), and route `_looks_like_gateway_process`,
   `_record_looks_like_gateway`, and `_scan_gateway_pids` through it.

2. restart() race. Wait until the gateway is authoritatively gone
   (`get_running_pid()` + strict `_gateway_pids()`) before relaunch; force-kill
   once if it lingers and raise rather than start a duplicate; verify the
   relaunch produced a running gateway and raise loudly if not (no more
   exit-0 silent outage).

Scoped to Windows; systemd/launchd restart paths are already drain-aware.
Adds tests/gateway/test_gateway_command_line_matcher.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 06:31:56 -07:00

48 lines
1.6 KiB
Python

"""Tests for the strict gateway command-line matcher.
Regression guard for the Windows ``hermes gateway restart`` silent-outage bug:
the previous loose substring match (``"... gateway" in cmdline``) false-matched
``gateway status``/``dashboard`` siblings and unrelated processes such as
``python -m tui_gateway``, which let ``restart()`` race a still-draining old
process and ``status``/``start`` report false positives.
"""
from __future__ import annotations
import pytest
from gateway.status import looks_like_gateway_command_line as matches
ACCEPT = [
"pythonw.exe -m hermes_cli.main gateway run",
r"C:\Users\me\hermes\venv\Scripts\pythonw.exe -m hermes_cli.main gateway run",
"python -m hermes_cli.main --profile work gateway run",
"python -m hermes_cli.main gateway run --replace",
"python -m hermes_cli/main.py gateway run",
"python gateway/run.py",
"hermes-gateway.exe",
"hermes gateway", # bare `hermes gateway` defaults to run
"hermes gateway run",
]
REJECT = [
"python -m tui_gateway", # unrelated module
"python -m hermes_cli.main gateway status", # other subcommand
"python -m hermes_cli.main gateway restart",
"python -m hermes_cli.main gateway stop",
"python -m hermes_cli.main --profile x dashboard", # non-gateway subcommand
"some random python -m mygateway thing",
"",
None,
]
@pytest.mark.parametrize("cmd", ACCEPT)
def test_accepts_real_gateway_run(cmd):
assert matches(cmd) is True
@pytest.mark.parametrize("cmd", REJECT)
def test_rejects_non_gateway_run(cmd):
assert matches(cmd) is False