test: migrate stale os.kill monkeypatches to gateway.status._pid_exists

PR #21561 migrated liveness probes across 14 call sites from `os.kill(pid, 0)` to `gateway.status._pid_exists` (psutil-first) so the gateway doesn't Ctrl+C-itself on Windows via bpo-14484. A handful of tests still patched the old `os.kill` seam and either happened to pass on POSIX (when PID 12345 incidentally wasn't alive on the CI worker) or failed outright — on CI runs they surfaced as 7 flaky/stable failures. Migrate each affected test to patch the correct seam: - tests/tools/test_browser_orphan_reaper.py (5 tests) Patch `gateway.status._pid_exists` instead of `os.kill`. Rename test_permission_error_on_kill_check_skips to test_alive_legacy_daemon_is_reaped — the old assertion was "PermissionError on sig 0 → skip dir"; post-migration the untracked-alive-daemon path always reaps the dir after SIGTERM (best-effort semantics were preserved). - tests/tools/test_windows_native_support.py (4 tests) Replace tests that asserted `os.kill` seam behavior with tests that exercise `ProcessRegistry._is_host_pid_alive` as a delegator and split out a new TestPidExistsOSErrorWidening class that hits `gateway.status._pid_exists` directly via the POSIX fallback branch (so Windows-style `OSError(WinError 87)` + `PermissionError` widening is still covered on Linux CI). - tests/tools/test_process_registry.py (1 test) Mock `psutil.Process` + `_pid_exists` instead of `os.kill` for the detached-session kill path. - tests/tools/test_mcp_stability.py::test_kill_orphaned_uses_sigkill_when_available SIGTERM → alive-check → SIGKILL flow now uses `_pid_exists` for the middle step; assertion count drops from 3 to 2. - tests/gateway/test_status.py::TestScopedLocks (2 tests) `acquire_scoped_lock` consults `_pid_exists`; patch that seam directly instead of trying to control the nested psutil call via os.kill monkeypatch. - tests/hermes_cli/test_gateway.py::test_stop_profile_gateway_keeps_pid_file_when_process_still_running The stop loop sends one SIGTERM via os.kill then polls 20x via _pid_exists; instrument both separately. Old assertion `calls["kill"] == 21` split into `kill == 1` + `alive_probes == 20`. - tests/hermes_cli/test_auth_toctou_file_modes.py::test_shared_nous_store_writes_0o600_with_0o700_parent Commit c34884ea2 switched the pytest seat-belt guard in `_nous_shared_store_path()` from `Path.home() / ".hermes"` to `get_default_hermes_root()`, which honors HERMES_HOME. The test sets both HERMES_HOME and HERMES_SHARED_AUTH_DIR to subpaths of the same tmp_path, and the override now collapses onto the same path the guard is refusing. Renamed the override subdirectory so the two paths diverge — guard passes, test runs. All 21 original CI failures and their local-flaky siblings now pass (278 tests across the touched files, 0 failures).
2026-05-12 03:42:08 +00:00 · 2026-05-08 14:18:41 -07:00 · 2026-05-08 14:18:41 -07:00 · f5ee780124
commit f5ee780124
parent 291a158441
7 changed files with 160 additions and 80 deletions
--- a/tests/hermes_cli/test_auth_toctou_file_modes.py
+++ b/tests/hermes_cli/test_auth_toctou_file_modes.py
@ -116,8 +116,12 @@ def test_shared_nous_store_writes_0o600_with_0o700_parent(tmp_path, monkeypatch)
    """The Nous shared-credential store must land at 0o600 / parent 0o700."""
    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
    # _nous_shared_store_path() refuses to touch the real shared store during
-    # pytest runs; redirect it into tmp_path explicitly.
-    monkeypatch.setenv("HERMES_SHARED_AUTH_DIR", str(tmp_path / "shared"))
+    # pytest runs; redirect it into tmp_path explicitly. Use a distinct
+    # subdirectory name (``shared_override``) so the guard's "real user
+    # home" reference — which currently tracks HERMES_HOME via
+    # get_default_hermes_root() — can't collide with our override and
+    # falsely claim we're writing to the real user's shared store.
+    monkeypatch.setenv("HERMES_SHARED_AUTH_DIR", str(tmp_path / "shared_override"))
    old_umask = os.umask(0o022)
    try:
        from hermes_cli import auth as auth_mod
--- a/tests/hermes_cli/test_gateway.py
+++ b/tests/hermes_cli/test_gateway.py
@ -450,14 +450,21 @@ class TestWaitForGatewayExit:

 class TestStopProfileGateway:
    def test_stop_profile_gateway_keeps_pid_file_when_process_still_running(self, monkeypatch):
-        calls = {"kill": 0, "remove": 0}
+        calls = {"kill": 0, "alive_probes": 0, "remove": 0}

        monkeypatch.setattr("gateway.status.get_running_pid", lambda: 12345)
+        # Post-#21561: the stop loop sends one SIGTERM via ``os.kill`` then
+        # polls liveness via ``gateway.status._pid_exists`` (safe on
+        # Windows — bpo-14484). Instrument both seams separately.
        monkeypatch.setattr(
            gateway.os,
            "kill",
            lambda pid, sig: calls.__setitem__("kill", calls["kill"] + 1),
        )
+        monkeypatch.setattr(
+            "gateway.status._pid_exists",
+            lambda pid: calls.__setitem__("alive_probes", calls["alive_probes"] + 1) or True,
+        )
        monkeypatch.setattr("time.sleep", lambda _: None)
        monkeypatch.setattr(
            "gateway.status.remove_pid_file",
@ -465,5 +472,6 @@ class TestStopProfileGateway:
        )

        assert gateway.stop_profile_gateway() is True
-        assert calls["kill"] == 21
+        assert calls["kill"] == 1          # one SIGTERM
+        assert calls["alive_probes"] == 20 # 20 liveness polls over the 2s window
        assert calls["remove"] == 0