mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-12 03:42:08 +00:00
test: migrate stale os.kill monkeypatches to gateway.status._pid_exists
PR #21561 migrated liveness probes across 14 call sites from
`os.kill(pid, 0)` to `gateway.status._pid_exists` (psutil-first) so
the gateway doesn't Ctrl+C-itself on Windows via bpo-14484. A handful of
tests still patched the old `os.kill` seam and either happened to pass
on POSIX (when PID 12345 incidentally wasn't alive on the CI worker) or
failed outright — on CI runs they surfaced as 7 flaky/stable failures.
Migrate each affected test to patch the correct seam:
- tests/tools/test_browser_orphan_reaper.py (5 tests)
Patch `gateway.status._pid_exists` instead of `os.kill`.
Rename test_permission_error_on_kill_check_skips to
test_alive_legacy_daemon_is_reaped — the old assertion was
"PermissionError on sig 0 → skip dir"; post-migration the
untracked-alive-daemon path always reaps the dir after SIGTERM
(best-effort semantics were preserved).
- tests/tools/test_windows_native_support.py (4 tests)
Replace tests that asserted `os.kill` seam behavior with tests
that exercise `ProcessRegistry._is_host_pid_alive` as a
delegator and split out a new TestPidExistsOSErrorWidening class
that hits `gateway.status._pid_exists` directly via the POSIX
fallback branch (so Windows-style `OSError(WinError 87)` + `PermissionError`
widening is still covered on Linux CI).
- tests/tools/test_process_registry.py (1 test)
Mock `psutil.Process` + `_pid_exists` instead of `os.kill`
for the detached-session kill path.
- tests/tools/test_mcp_stability.py::test_kill_orphaned_uses_sigkill_when_available
SIGTERM → alive-check → SIGKILL flow now uses `_pid_exists`
for the middle step; assertion count drops from 3 to 2.
- tests/gateway/test_status.py::TestScopedLocks (2 tests)
`acquire_scoped_lock` consults `_pid_exists`; patch that
seam directly instead of trying to control the nested psutil
call via os.kill monkeypatch.
- tests/hermes_cli/test_gateway.py::test_stop_profile_gateway_keeps_pid_file_when_process_still_running
The stop loop sends one SIGTERM via os.kill then polls 20x via
_pid_exists; instrument both separately. Old assertion
`calls["kill"] == 21` split into `kill == 1` + `alive_probes == 20`.
- tests/hermes_cli/test_auth_toctou_file_modes.py::test_shared_nous_store_writes_0o600_with_0o700_parent
Commit c34884ea2 switched the pytest seat-belt guard in
`_nous_shared_store_path()` from `Path.home() / ".hermes"`
to `get_default_hermes_root()`, which honors HERMES_HOME. The
test sets both HERMES_HOME and HERMES_SHARED_AUTH_DIR to
subpaths of the same tmp_path, and the override now collapses
onto the same path the guard is refusing. Renamed the override
subdirectory so the two paths diverge — guard passes, test runs.
All 21 original CI failures and their local-flaky siblings now pass
(278 tests across the touched files, 0 failures).
This commit is contained in:
parent
291a158441
commit
f5ee780124
7 changed files with 160 additions and 80 deletions
|
|
@ -728,18 +728,30 @@ class TestKillProcess:
|
|||
s.detached = True
|
||||
registry._running[s.id] = s
|
||||
|
||||
calls = []
|
||||
terminate_calls = []
|
||||
|
||||
def fake_kill(pid, sig):
|
||||
calls.append((pid, sig))
|
||||
class FakeProcess:
|
||||
def __init__(self, pid):
|
||||
self.pid = pid
|
||||
def children(self, recursive=False):
|
||||
return []
|
||||
def terminate(self):
|
||||
terminate_calls.append(("terminate", self.pid))
|
||||
|
||||
import psutil as _psutil
|
||||
|
||||
try:
|
||||
with patch("tools.process_registry.os.kill", side_effect=fake_kill):
|
||||
# Post-#21561: liveness probe routes through
|
||||
# ``ProcessRegistry._is_host_pid_alive`` (→
|
||||
# ``gateway.status._pid_exists``), and the actual kill on POSIX
|
||||
# routes through ``psutil.Process(pid).terminate()``. Neither
|
||||
# touches ``os.kill`` directly. Mock both seams.
|
||||
with patch("gateway.status._pid_exists", return_value=True), \
|
||||
patch.object(_psutil, "Process", side_effect=lambda pid: FakeProcess(pid)):
|
||||
result = registry.kill_process(s.id)
|
||||
|
||||
assert result["status"] == "killed"
|
||||
assert (424242, 0) in calls
|
||||
assert (424242, signal.SIGTERM) in calls
|
||||
assert ("terminate", 424242) in terminate_calls
|
||||
finally:
|
||||
registry._running.pop(s.id, None)
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue