hermes-agent/hermes_cli/mcp_startup.py
zapabob e55ddc3e33 fix(mcp): suppress interactive OAuth stdin prompts during background discovery (#35927)
When an MCP server requires OAuth, the interactive `hermes` TUI froze on
startup: background MCP discovery hit the OAuth flow, which on an interactive
TTY spawns a daemon thread doing a blocking `sys.stdin.readline()` (the
"paste the redirect URL" fallback in mcp_oauth._wait_for_callback). That
thread competes with the TUI's own stdin reader for the same terminal, so
keystrokes get swallowed and the TUI appears frozen (up to the 300s OAuth
timeout). Reported symptom: "MCP OAuth: authorization required / Open this URL
... the tui is freezing, not respond to typing."

Add a thread-local `suppress_interactive_oauth()` context manager in
tools/mcp_oauth.py; `_is_interactive()` returns False while it's active, so the
stdin paste-thread and prompt are never created. Background discovery
(hermes_cli/mcp_startup.py, tui_gateway/entry.py) now runs discovery inside
that context, so OAuth-requiring servers soft-skip (raise
OAuthNonInteractiveError, already handled) instead of stealing the TUI's stdin.
A real `hermes mcp login` on the main thread is unaffected (thread-local).

Salvaged from #35945 by @zapabob (authorship preserved via cherry-pick;
resolved a conflict against main's new mcp_discovery_timeout / wait_for_mcp_
discovery refactor, keeping both). Verified E2E: with suppression the paste
prompt is NOT printed and no stdin thread spawns (raises OAuthNonInteractive
soft-skip); without it the prompt shows (the freeze). Mutation-verified
(removing the suppress check in _is_interactive fails the regression test).
76 tests pass, ruff clean.

Closes #35927.

SELF-REVIEW FIX: the original #35945 used threading.local(), which does NOT
propagate to the dedicated mcp-event-loop thread where OAuth actually runs
(discover_mcp_tools dispatches the connect via run_coroutine_threadsafe), so
the suppression was a NO-OP in production (the tests passed only by stubbing
out the cross-thread dispatch). Converted to a contextvars.ContextVar, which
asyncio copies onto the scheduled coroutine — empirically verified suppression
now holds on the mcp-event-loop thread through the real _run_on_mcp_loop path.
Added a cross-thread regression test (fails on threading.local, passes on the
ContextVar) so the no-op can't regress.
2026-06-27 04:59:23 +05:30

100 lines
3.5 KiB
Python

"""Shared CLI/TUI-safe helpers for background MCP discovery."""
from __future__ import annotations
import threading
from contextlib import nullcontext
from typing import Optional
_mcp_discovery_lock = threading.Lock()
_mcp_discovery_started = False
_mcp_discovery_thread: Optional[threading.Thread] = None
def _has_configured_mcp_servers() -> bool:
"""Cheap config probe so non-MCP users avoid importing the MCP stack."""
try:
from hermes_cli.config import read_raw_config
mcp_servers = (read_raw_config() or {}).get("mcp_servers")
return isinstance(mcp_servers, dict) and len(mcp_servers) > 0
except Exception:
# Be conservative: if config probing fails, try discovery in the
# background so startup still can't block.
return True
def start_background_mcp_discovery(*, logger, thread_name: str) -> None:
"""Spawn one shared background MCP discovery thread for this process."""
global _mcp_discovery_started, _mcp_discovery_thread
with _mcp_discovery_lock:
if _mcp_discovery_started:
return
_mcp_discovery_started = True
if not _has_configured_mcp_servers():
return
def _discover() -> None:
try:
_discover_mcp_tools_without_interactive_oauth()
except Exception:
logger.debug("Background MCP tool discovery failed", exc_info=True)
thread = threading.Thread(
target=_discover,
name=thread_name,
daemon=True,
)
_mcp_discovery_thread = thread
thread.start()
def _resolve_discovery_timeout(explicit: "float | None") -> float:
"""Resolve the MCP discovery wait bound: explicit arg > config > default.
Reads ``mcp_discovery_timeout`` from config.yaml, defaulting to the value in
``DEFAULT_CONFIG`` (single source of truth) when the key is absent. Kept lazy
and fail-safe — a missing/invalid value or a broken config falls back to a
short safe bound so startup can never hang or crash.
"""
if explicit is not None:
return explicit
try:
from hermes_cli.config import load_config, DEFAULT_CONFIG
default = float(DEFAULT_CONFIG.get("mcp_discovery_timeout", 1.5))
raw = (load_config() or {}).get("mcp_discovery_timeout", default)
val = float(raw)
return val if val > 0 else default
except Exception:
return 1.5
def _discover_mcp_tools_without_interactive_oauth() -> None:
"""Run MCP discovery without letting OAuth read from the user's stdin."""
try:
from tools.mcp_oauth import suppress_interactive_oauth
except Exception:
suppress_interactive_oauth = nullcontext
with suppress_interactive_oauth():
from tools.mcp_tool import discover_mcp_tools
discover_mcp_tools()
def wait_for_mcp_discovery(timeout: "float | None" = None) -> None:
"""Wait for background MCP discovery before the first tool snapshot.
``thread.join(timeout)`` returns the INSTANT discovery completes, so this
only ever blocks for the real connect time of a still-pending server —
users with no MCP servers or fast servers pay ~0s. The bound (from
``mcp_discovery_timeout`` in config) just caps the wait so a dead server
can't freeze startup; servers that miss it are picked up by the automatic
late-binding refresh.
"""
thread = _mcp_discovery_thread
if thread is None or not thread.is_alive():
return
thread.join(timeout=_resolve_discovery_timeout(timeout))