fix(kanban): treat already-gone worker as terminated, not survived

_terminate_reclaimed_worker early-returned on ProcessLookupError with
terminated=False. The new reclaim-defer guard reads that as 'worker
survived the kill' and defers the reclaim forever, so a stale task whose
worker is already dead never lands in result.stale. ProcessLookupError
means the process is gone — that IS a successful termination. Split it
from the generic OSError branch and set terminated=True.
This commit is contained in:
Teknium 2026-06-19 07:08:40 -07:00
parent b9e521da23
commit 35e7ca03d5

View file

@ -5131,7 +5131,13 @@ def _terminate_reclaimed_worker(
info["termination_attempted"] = True
try:
kill(int(pid), signal.SIGTERM)
except (ProcessLookupError, OSError):
except ProcessLookupError:
# Process is already gone — that's a successful termination, not a
# survival. Leaving terminated=False here would make the reclaim guard
# misread a dead worker as still-alive and defer forever.
info["terminated"] = True
return info
except OSError:
return info
for _ in range(10):