mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
2 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
a3a4932405
|
fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025) (#12717)
* [verified] fix(mcp-oauth): bridge httpx auth_flow bidirectional generator HermesMCPOAuthProvider.async_auth_flow wrapped the SDK's auth_flow with 'async for item in super().async_auth_flow(request): yield item', which discards httpx's .asend(response) values and resumes the inner generator with None. This broke every OAuth MCP server on the first HTTP response with 'NoneType' object has no attribute 'status_code' crashing at mcp/client/auth/oauth2.py:505. Replace with a manual bridge that forwards .asend() values into the inner generator, preserving httpx's bidirectional auth_flow contract. Add tests/tools/test_mcp_oauth_bidirectional.py with two regression tests that drive the flow through real .asend() round-trips. These catch the bug at the unit level; prior tests only exercised _initialize() and disk-watching, never the full generator protocol. Verified against BetterStack MCP: Before: 'Connection failed (11564ms): NoneType...' after 3 retries After: 'Connected (2416ms); Tools discovered: 83' Regression from #11383. * [verified] fix(mcp-oauth): seed token_expiry_time + pre-flight AS discovery on cold-load PR #11383's consolidation fixed external-refresh reloading and 401 dedup but left two latent bugs that surfaced on BetterStack and any other OAuth MCP with a split-origin authorization server: 1. HermesTokenStorage persisted only a relative 'expires_in', which is meaningless after a process restart. The MCP SDK's OAuthContext does NOT seed token_expiry_time in _initialize, so is_token_valid() returned True for any reloaded token regardless of age. Expired tokens shipped to servers, and app-level auth failures (e.g. BetterStack's 'No teams found. Please check your authentication.') were invisible to the transport-layer 401 handler. 2. Even once preemptive refresh did fire, the SDK's _refresh_token falls back to {server_url}/token when oauth_metadata isn't cached. For providers whose AS is at a different origin (BetterStack: mcp.betterstack.com for MCP, betterstack.com/oauth/token for the token endpoint), that fallback 404s and drops into full browser re-auth on every process restart. Fix set: - HermesTokenStorage.set_tokens persists an absolute wall-clock expires_at alongside the SDK's OAuthToken JSON (time.time() + TTL at write time). - HermesTokenStorage.get_tokens reconstructs expires_in from max(expires_at - now, 0), clamping expired tokens to zero TTL. Legacy files without expires_at fall back to file-mtime as a best-effort wall-clock proxy, self-healing on the next set_tokens. - HermesMCPOAuthProvider._initialize calls super(), then update_token_expiry on the reloaded tokens so token_expiry_time reflects actual remaining TTL. If tokens are loaded but oauth_metadata is missing, pre-flight PRM + ASM discovery runs via httpx.AsyncClient using the MCP SDK's own URL builders and response handlers (build_protected_resource_metadata_discovery_urls, handle_auth_metadata_response, etc.) so the SDK sees the correct token_endpoint before the first refresh attempt. Pre-flight is skipped when there are no stored tokens to keep fresh-install paths zero-cost. Test coverage (tests/tools/test_mcp_oauth_cold_load_expiry.py): - set_tokens persists absolute expires_at - set_tokens skips expires_at when token has no expires_in - get_tokens round-trips expires_at -> remaining expires_in - expired tokens reload with expires_in=0 - legacy files without expires_at fall back to mtime proxy - _initialize seeds token_expiry_time from stored tokens - _initialize flags expired-on-disk tokens as is_token_valid=False - _initialize pre-flights PRM + ASM discovery with mock transport - _initialize skips pre-flight when no tokens are stored Verified against BetterStack MCP: hermes mcp test betterstack -> Connected (2508ms), 83 tools mcp_betterstack_telemetry_list_teams_tool -> real team data, not 'No teams found. Please check your authentication.' Reference: mcp-oauth-token-diagnosis skill, Fix A. * chore: map hermes@noushq.ai to benbarclay in AUTHOR_MAP Needed for CI attribution check on cherry-picked commits from PR #12025. --------- Co-authored-by: Hermes Agent <hermes@noushq.ai> |
||
|
|
70768665a4
|
fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383)
* feat(mcp-oauth): scaffold MCPOAuthManager
Central manager for per-server MCP OAuth state. Provides
get_or_build_provider (cached), remove (evicts cache + deletes
disk), invalidate_if_disk_changed (mtime watch, core fix for
external-refresh workflow), and handle_401 (dedup'd recovery).
No behavior change yet — existing call sites still use
build_oauth_auth directly. Task 1 of 8 in the MCP OAuth
consolidation (fixes Cthulhu's BetterStack reliability issues).
* feat(mcp-oauth): add HermesMCPOAuthProvider with pre-flow disk watch
Subclasses the MCP SDK's OAuthClientProvider to inject a disk
mtime check before every async_auth_flow, via the central
manager. When a subclass instance is used, external token
refreshes (cron, another CLI instance) are picked up before
the next API call.
Still dead code: the manager's _build_provider still delegates
to build_oauth_auth and returns the plain OAuthClientProvider.
Task 4 wires this subclass in. Task 2 of 8.
* refactor(mcp-oauth): extract build_oauth_auth helpers
Decomposes build_oauth_auth into _configure_callback_port,
_build_client_metadata, _maybe_preregister_client, and
_parse_base_url. Public API preserved. These helpers let
MCPOAuthManager._build_provider reuse the same logic in Task 4
instead of duplicating the construction dance.
Also updates the SDK version hint in the warning from 1.10.0 to
1.26.0 (which is what we actually require for the OAuth types
used here). Task 3 of 8.
* feat(mcp-oauth): manager now builds HermesMCPOAuthProvider directly
_build_provider constructs the disk-watching subclass using the
helpers from Task 3, instead of delegating to the plain
build_oauth_auth factory. Any consumer using the manager now gets
pre-flow disk-freshness checks automatically.
build_oauth_auth is preserved as the public API for backwards
compatibility. The code path is now:
MCPOAuthManager.get_or_build_provider ->
_build_provider ->
_configure_callback_port
_build_client_metadata
_maybe_preregister_client
_parse_base_url
HermesMCPOAuthProvider(...)
Task 4 of 8.
* feat(mcp): wire OAuth manager + add _reconnect_event
MCPServerTask gains _reconnect_event alongside _shutdown_event.
When set, _run_http / _run_stdio exit their async-with blocks
cleanly (no exception), and the outer run() loop re-enters the
transport to rebuild the MCP session with fresh credentials.
This is the recovery path for OAuth failures that the SDK's
in-place httpx.Auth cannot handle (e.g. cron externally consumed
the refresh_token, or server-side session invalidation).
_run_http now asks MCPOAuthManager for the OAuth provider
instead of calling build_oauth_auth directly. Config-time,
runtime, and reconnect paths all share one provider instance
with pre-flow disk-watch active.
shutdown() defensively sets both events so there is no race
between reconnect and shutdown signalling.
Task 5 of 8.
* feat(mcp): detect auth failures in tool handlers, trigger reconnect
All 5 MCP tool handlers (tool call, list_resources, read_resource,
list_prompts, get_prompt) now detect auth failures and route
through MCPOAuthManager.handle_401:
1. If the manager says recovery is viable (disk has fresh tokens,
or SDK can refresh in-place), signal MCPServerTask._reconnect_event
to tear down and rebuild the MCP session with fresh credentials,
then retry the tool call once.
2. If no recovery path exists, return a structured needs_reauth
JSON error so the model stops hallucinating manual refresh
attempts (the 'let me curl the token endpoint' loop Cthulhu
pasted from Discord).
_is_auth_error catches OAuthFlowError, OAuthTokenError,
OAuthNonInteractiveError, and httpx.HTTPStatusError(401). Non-auth
exceptions still surface via the generic error path unchanged.
Task 6 of 8.
* feat(mcp-cli): route add/remove through manager, add 'hermes mcp login'
cmd_mcp_add and cmd_mcp_remove now go through MCPOAuthManager
instead of calling build_oauth_auth / remove_oauth_tokens
directly. This means CLI config-time state and runtime MCP
session state are backed by the same provider cache — removing
a server evicts the live provider, adding a server populates
the same cache the MCP session will read from.
New 'hermes mcp login <name>' command:
- Wipes both the on-disk tokens file and the in-memory
MCPOAuthManager cache
- Triggers a fresh OAuth browser flow via the existing probe
path
- Intended target for the needs_reauth error Task 6 returns
to the model
Task 7 of 8.
* test(mcp-oauth): end-to-end integration tests
Five new tests exercising the full consolidation with real file
I/O and real imports (no transport mocks):
1. external_refresh_picked_up_without_restart — Cthulhu's cron
workflow. External process writes fresh tokens to disk;
on the next auth flow the manager's mtime-watch flips
_initialized and the SDK re-reads from storage.
2. handle_401_deduplicates_concurrent_callers — 10 concurrent
handlers for the same failed token fire exactly ONE recovery
attempt (thundering-herd protection).
3. handle_401_returns_false_when_no_provider — defensive path
for unknown servers.
4. invalidate_if_disk_changed_handles_missing_file — pre-auth
state returns False cleanly.
5. provider_is_reused_across_reconnects — cache stickiness so
reconnects preserve the disk-watch baseline mtime.
Task 8 of 8 — consolidation complete.
|