fix(discord): shield text-batch flush from follow-up cancel (#12444)

When Discord splits a long message at 2000 chars, _enqueue_text_event
buffers each chunk and schedules a _flush_text_batch task with a
short delay.  If another chunk lands while the prior flush task is
already inside handle_message, _enqueue_text_event calls
prior_task.cancel() — and without asyncio.shield, CancelledError
propagates from the flush task into handle_message → the agent's
streaming request, aborting the response the user was waiting on.

Reproducer: user sends a 3000-char prompt (split by Discord into 2
messages).  Chunk 1 lands, flush delay starts, chunk 2 lands during
the brief window when chunk 1's flush has already committed to
handle_message.  Agent's current streaming response is cancelled
with CancelledError, user sees a truncated or missing reply.

Fix (gateway/platforms/discord.py):
- Wrap the handle_message call in asyncio.shield so the inner
  dispatch is protected from the outer task's cancel.
- Add an except asyncio.CancelledError clause so the outer task
  still exits cleanly when cancel lands during the sleep window
  (before the pop) — semantics for that path are unchanged.

The new flush task spawned by the follow-up chunk still handles its
own batch via the normal pending-message / active-session machinery
in base.py, so follow-ups are not lost.

Tests: tests/gateway/test_text_batching.py —
test_shield_protects_handle_message_from_cancel.  Tracks a distinct
first_handle_cancelled event so the assertion fails cleanly when the
shield is missing (verified by stashing the fix and re-running).

Live E2E on the live-loaded DiscordAdapter:
  first_handle_cancelled: False  (shield worked)
  first_handle_completed: True   (handle_message ran to completion)
This commit is contained in:
Teknium 2026-04-19 00:09:38 -07:00 committed by GitHub
parent dca439fe92
commit 7c10761dd2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 78 additions and 1 deletions

View file

@ -3265,7 +3265,20 @@ class DiscordAdapter(BasePlatformAdapter):
"[Discord] Flushing text batch %s (%d chars)",
key, len(event.text or ""),
)
await self.handle_message(event)
# Shield the downstream dispatch so that a subsequent chunk
# arriving while handle_message is mid-flight cannot cancel
# the running agent turn. _enqueue_text_event always cancels
# the prior flush task when a new chunk lands; without this
# shield, CancelledError would propagate from our task down
# into handle_message → the agent's streaming request,
# aborting the response the user was waiting on. The new
# chunk is handled by the fresh flush task regardless.
await asyncio.shield(self.handle_message(event))
except asyncio.CancelledError:
# Only reached if cancel landed before the pop — the shielded
# handle_message is unaffected either way. Let the task exit
# cleanly so the finally block cleans up.
pass
finally:
if self._pending_text_batch_tasks.get(key) is current_task:
self._pending_text_batch_tasks.pop(key, None)