fix(mcp): forward OAuth auth and bump sse_read_timeout on SSE transport (#21323)

* fix(mcp): re-raise CancelledError explicitly in MCPServerTask.run

On Python 3.11+, `asyncio.CancelledError` inherits from `BaseException`
(not `Exception`), so the broad `except Exception as exc:` in
`MCPServerTask.run`'s transport loop did NOT catch it. Task cancellation
from gateway restart / explicit `task.cancel()` silently escaped past
the reconnect logic — the MCP server task died without going through
the shutdown/reconnect code paths that check `_shutdown_event`.

Add an explicit `except asyncio.CancelledError: raise` before the broad
catch so cancellation propagation is self-documenting rather than an
accident of exception hierarchy, and future sibling-site work (e.g.
distinguishing shutdown-cancel from transport-cancel) has an obvious
hook. Behavior on pre-3.8 Pythons where CancelledError WAS an Exception
subclass is also corrected: the old path would have caught it and
treated it as a connection failure worth retrying.

Closes #9930.

* fix(mcp): forward OAuth auth and bump sse_read_timeout on SSE transport

Two surgical correctness bugs in the SSE branch of MCPServerTask._run_http,
distilled from @amiller's PR #5981 that couldn't be cherry-picked wholesale
(branch too stale).

1. sse_read_timeout was set to the tool timeout (default 60s). That's the
   wrong dimension — it governs how long sse_client will wait between
   events on the SSE stream, not per-call latency. SSE servers routinely
   hold the stream idle for minutes between events; a 60s read timeout
   drops the connection after the first slow stretch (Router Teamwork,
   Supermemory on Cloudflare Workers idle-disconnect at ~60s). Bump to
   300s to match the Streamable HTTP path's httpx read timeout.

2. OAuth auth was built via get_manager().get_or_build_provider() but
   never forwarded to sse_client. SSE MCP servers behind OAuth 2.1 PKCE
   would silently fail with 401s on every request.

Keepalive (the other half of #5981) intentionally left for a follow-up —
it's a real improvement but a bigger change, and these two are obvious
corrections to ship now. Credits to @amiller.

Co-authored-by: Andrew Miller <socrates1024@gmail.com>

---------

Co-authored-by: Andrew Miller <socrates1024@gmail.com>
This commit is contained in:
Teknium 2026-05-07 07:08:04 -07:00 committed by GitHub
parent 4ee6c3349a
commit dd2dc2bddf
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 229 additions and 6 deletions

View file

@ -1243,12 +1243,26 @@ class MCPServerTask:
"mcp.client.sse.sse_client is not available. "
"Upgrade the mcp package to get SSE support."
)
async with sse_client(
url=url,
headers=headers or None,
timeout=float(connect_timeout),
sse_read_timeout=float(config.get("timeout", _DEFAULT_TOOL_TIMEOUT)),
) as (read_stream, write_stream):
# sse_read_timeout governs how long sse_client will wait between
# events on the SSE stream. Using the tool_timeout (default 60s)
# here is wrong: SSE servers commonly hold the stream idle for
# minutes between events, so a 60s read timeout drops the
# connection after the first slow stretch. 300s matches the
# Streamable HTTP code path's httpx read timeout below. Original
# observation from @amiller in PR #5981 (Router Teamwork,
# Supermemory on Cloudflare Workers idle-disconnect at ~60s).
_sse_kwargs: dict = {
"url": url,
"headers": headers or None,
"timeout": float(connect_timeout),
"sse_read_timeout": 300.0,
}
if _oauth_auth is not None:
# Pass OAuth auth through to sse_client so SSE MCP servers
# behind OAuth 2.1 PKCE work. Previously built but never
# forwarded — SSE OAuth would silently fail with 401s.
_sse_kwargs["auth"] = _oauth_auth
async with sse_client(**_sse_kwargs) as (read_stream, write_stream):
async with ClientSession(
read_stream, write_stream, **sampling_kwargs
) as session: