fix(mcp): re-raise CancelledError explicitly in MCPServerTask.run (#21318)

On Python 3.11+, `asyncio.CancelledError` inherits from `BaseException`
(not `Exception`), so the broad `except Exception as exc:` in
`MCPServerTask.run`'s transport loop did NOT catch it. Task cancellation
from gateway restart / explicit `task.cancel()` silently escaped past
the reconnect logic — the MCP server task died without going through
the shutdown/reconnect code paths that check `_shutdown_event`.

Add an explicit `except asyncio.CancelledError: raise` before the broad
catch so cancellation propagation is self-documenting rather than an
accident of exception hierarchy, and future sibling-site work (e.g.
distinguishing shutdown-cancel from transport-cancel) has an obvious
hook. Behavior on pre-3.8 Pythons where CancelledError WAS an Exception
subclass is also corrected: the old path would have caught it and
treated it as a connection failure worth retrying.

Closes #9930.
This commit is contained in:
Teknium 2026-05-07 07:04:38 -07:00 committed by GitHub
parent 5a3e5b23d2
commit e0a2b08768
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 104 additions and 0 deletions

View file

@ -1399,6 +1399,18 @@ class MCPServerTask:
# still detect a transient in-flight state — it'll be
# re-set after the fresh session initializes.
continue
except asyncio.CancelledError:
# Task was cancelled (shutdown, gateway restart, explicit
# task.cancel()). Don't treat this as a connection failure —
# CancelledError inherits from BaseException (not Exception)
# in Python 3.11+, so the broad ``except Exception`` below
# would NOT catch it; we'd silently exit the reconnect loop
# and the MCP server would stay dead until Hermes is fully
# restarted. Re-raise so the task's cancellation propagates
# correctly to asyncio's task machinery and ``shutdown()``'s
# ``await self._task`` completes. See #9930.
self.session = None
raise
except Exception as exc:
self.session = None