fix: prevent duplicate completion notifications on process kill (#7124)

When kill_process() sends SIGTERM, both it and the reader thread race
to call _move_to_finished() — kill_process sets exit_code=-15 and
enqueues a notification, then the reader thread's process.wait()
returns with exit_code=143 (128+SIGTERM) and enqueues a second one.

Fix: make _move_to_finished() idempotent by tracking whether the
session was actually removed from _running. The second call sees it
was already moved and skips the completion_queue.put().

Adds regression test: test_move_to_finished_idempotent_no_duplicate
This commit is contained in:
Teknium 2026-04-10 03:52:16 -07:00 committed by GitHub
parent 00dd5cc491
commit c8e4dcf412
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 31 additions and 5 deletions

View file

@ -484,15 +484,21 @@ class ProcessRegistry:
self._move_to_finished(session)
def _move_to_finished(self, session: ProcessSession):
"""Move a session from running to finished."""
"""Move a session from running to finished.
Idempotent: if the session was already moved (e.g. kill_process raced
with the reader thread), the second call is a no-op no duplicate
completion notification is enqueued.
"""
with self._lock:
self._running.pop(session.id, None)
was_running = self._running.pop(session.id, None) is not None
self._finished[session.id] = session
self._write_checkpoint()
# If the caller requested agent notification, enqueue the completion
# so the CLI/gateway can auto-trigger a new agent turn.
if session.notify_on_complete:
# Only enqueue completion notification on the FIRST move. Without
# this guard, kill_process() and the reader thread can both call
# _move_to_finished(), producing duplicate [SYSTEM: ...] messages.
if was_running and session.notify_on_complete:
from tools.ansi_strip import strip_ansi
output_tail = strip_ansi(session.output_buffer[-2000:]) if session.output_buffer else ""
self.completion_queue.put({