test(kanban): align two tests with recent kanban hardening

Two pre-existing test failures on main, both pointing at code that
was hardened recently — not behaviour bugs, test expectations that
fell out of date.

1. tests/tools/test_kanban_tools.py::test_worker_complete_rejects_stale_run_id
   c002668ff ("fix(kanban): add grace period to detect_crashed_workers")
   gates each running task behind a launch-window grace period so
   freshly-spawned workers whose PID isn't yet visible on /proc don't
   get reclaimed. The test creates a worker_env fixture moments before
   asserting reclamation, so the default 30s grace skips the liveness
   check and detect_crashed_workers returns []. Fix: set
   HERMES_KANBAN_CRASH_GRACE_SECONDS=0 in the test so we get the
   immediate-reclaim semantics the assertion expects.

2. tests/tools/test_windows_native_support.py::
     TestKanbanWaitpidWindowsGuard::test_source_gates_waitpid_loop
   ffdc937c1 ("fix(kanban): hoist zombie reaper out of dispatch_once")
   reshaped reap_worker_zombies to use an early-return Windows guard
   (\`if os.name == "nt": return []\`) instead of an inverted gate
   (\`if os.name != "nt":\`). Both correctly keep the waitpid loop off
   Windows — the early-return form is stronger because the rest of the
   function never runs. Fix: accept either gate pattern in the source
   scan.

Both failures reproduce verbatim on \`origin/main\` in a clean env;
neither relates to in-flight work on #33564 (the FD-leak fix). Filing
this as a separate fix-it PR per green-CI-policy so the kanban CI
shard stays green for downstream PRs.
This commit is contained in:
teknium1 2026-05-27 18:17:29 -07:00 committed by Teknium
parent 2d5dcfabc3
commit 36c99af37a
2 changed files with 22 additions and 3 deletions

View file

@ -1326,6 +1326,14 @@ def test_worker_complete_rejects_stale_run_id(worker_env, monkeypatch):
from hermes_cli import kanban_db as kb
import hermes_cli.kanban_db as _kb
# detect_crashed_workers now gates each running task behind a
# launch-window grace period (c002668ff) so a freshly-spawned worker
# whose PID isn't yet visible on /proc isn't reclaimed. The fixture
# creates the task moments before this assertion, so the grace
# period (default 30s) would skip the liveness check. Zero it out
# for this test — we WANT immediate reclamation here.
monkeypatch.setenv("HERMES_KANBAN_CRASH_GRACE_SECONDS", "0")
conn = kb.connect()
try:
run1 = kb.latest_run(conn, worker_env)

View file

@ -625,10 +625,21 @@ class TestKanbanWaitpidWindowsGuard:
# Find the waitpid call and confirm it's inside a POSIX gate.
idx = source.find("os.waitpid(-1, os.WNOHANG)")
assert idx > 0, "waitpid call must exist"
# Look backwards up to 400 chars for the gate.
# Look backwards up to 400 chars for the gate. Accept either form:
# `if os.name != "nt":` (run iff POSIX), or
# `if os.name == "nt": return []` (early-return guard).
# Both correctly keep the waitpid loop off Windows; the early-return
# form is stronger because the rest of the function never runs.
preamble = source[max(0, idx - 400):idx]
assert 'os.name != "nt"' in preamble or "os.name != 'nt'" in preamble, (
"os.waitpid(-1, os.WNOHANG) must sit behind an os.name != 'nt' guard"
guard_patterns = (
'os.name != "nt"',
"os.name != 'nt'",
'os.name == "nt"', # early-return guard
"os.name == 'nt'",
)
assert any(p in preamble for p in guard_patterns), (
"os.waitpid(-1, os.WNOHANG) must sit behind an os.name guard "
f"(checked patterns: {guard_patterns})"
)