Address correctness gaps found in pre-PR review of the strict matcher:
- Profile selectors can appear on EITHER side of the `gateway` token
(`_apply_profile_override` strips `--profile`/`-p` from anywhere in argv
before argparse), so `hermes gateway --profile work run` and
`python -m hermes_cli.main gateway -p work run` are valid launches the
previous matcher wrongly rejected. Strip `--profile`/`-p`/`--profile=`/`-p=`
from anywhere before locating the subcommand.
- A profile literally named `gateway` (`hermes -p gateway gateway run`) made
the old token scan stop on the profile value; stripping the selector+value
first fixes it.
- Tokenize quote-aware with `shlex` so quoted Windows paths containing spaces
(`"C:\Program Files\Hermes\hermes-gateway.exe"`) are no longer split mid-path
and the dedicated-entrypoint match survives.
Without these, the matcher could MISS a real running gateway -> the opposite
failure (restart/status reporting "down" when up). Adds regression tests for
all three shapes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`hermes gateway restart` on Windows could take the gateway offline with no
replacement. restart() was stop() -> sleep(1.0) -> start(), but the graceful
drain can run up to ~180s while the detached pythonw process stays alive. The
1s sleep let start() run against the still-draining old process; its
"already running" guard then no-opped, and when the old process finally exited
nothing relaunched it.
Two root causes, both fixed:
1. Loose PID detection. `_scan_gateway_pids` and the gateway.status helpers
used substring matches ("... gateway" in cmdline) for lifecycle decisions,
so they false-matched `gateway status`/`dashboard` siblings and unrelated
processes like `python -m tui_gateway`, plus stale gateway.pid records.
Add a shared strict matcher `looks_like_gateway_command_line()` in
gateway/status.py that requires the real `gateway run` subcommand (or the
dedicated entrypoints), and route `_looks_like_gateway_process`,
`_record_looks_like_gateway`, and `_scan_gateway_pids` through it.
2. restart() race. Wait until the gateway is authoritatively gone
(`get_running_pid()` + strict `_gateway_pids()`) before relaunch; force-kill
once if it lingers and raise rather than start a duplicate; verify the
relaunch produced a running gateway and raise loudly if not (no more
exit-0 silent outage).
Scoped to Windows; systemd/launchd restart paths are already drain-aware.
Adds tests/gateway/test_gateway_command_line_matcher.py.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>