fix(browser_tool): do not cache transient None cloud provider resolution

Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True`
before resolution. If credentials were briefly unavailable on the first
call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned
the entire process to local mode forever, even after credentials
self-healed seconds later.

Root cause: bookkeeping was set up-front, so any code path that fell
through to `return _cached_cloud_provider` (config read failure, no
credentials yet, explicit-provider instantiation failure) committed the
transient `None` to the cache permanently.

Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now
set only when (a) the user explicitly chose `cloud_provider: local`, or
(b) a provider was successfully resolved. All transient `None` paths
return without poisoning the cache, so the next call retries. Explicit
provider instantiation failures now log at warning level with stack
trace so operators can diagnose them.

Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py
covering explicit local, successful resolution, no-credentials-yet,
config read failure, and explicit provider instantiation failure.
Stash-verify confirmed the 3 transient-None tests fail without the fix.
All 320 existing browser tests still green.

Closes #22324
This commit is contained in:
Wesley Simplicio 2026-05-09 08:51:52 -03:00 committed by Teknium
parent 5a0021146b
commit 3170c8d448
2 changed files with 146 additions and 5 deletions

View file

@ -422,7 +422,7 @@ def _get_cloud_provider() -> Optional[CloudBrowserProvider]:
if _cloud_provider_resolved:
return _cached_cloud_provider
_cloud_provider_resolved = True
resolved: Optional[CloudBrowserProvider] = None
try:
from hermes_cli.config import read_raw_config
cfg = read_raw_config()
@ -434,23 +434,39 @@ def _get_cloud_provider() -> Optional[CloudBrowserProvider]:
)
if provider_key == "local":
_cached_cloud_provider = None
_cloud_provider_resolved = True
return None
if provider_key and provider_key in _PROVIDER_REGISTRY:
_cached_cloud_provider = _PROVIDER_REGISTRY[provider_key]()
try:
resolved = _PROVIDER_REGISTRY[provider_key]()
except Exception:
logger.warning(
"Failed to instantiate explicit cloud_provider %r; will retry on next call",
provider_key,
exc_info=True,
)
return None
except Exception as e:
logger.debug("Could not read cloud_provider from config: %s", e)
return None
if _cached_cloud_provider is None:
if resolved is None:
# Prefer Browser Use (managed Nous gateway or direct API key),
# fall back to Browserbase (direct credentials only).
fallback_provider = BrowserUseProvider()
if fallback_provider.is_configured():
_cached_cloud_provider = fallback_provider
resolved = fallback_provider
else:
fallback_provider = BrowserbaseProvider()
if fallback_provider.is_configured():
_cached_cloud_provider = fallback_provider
resolved = fallback_provider
if resolved is None:
# Transient None — credentials may self-heal. Don't poison the cache.
return None
_cached_cloud_provider = resolved
_cloud_provider_resolved = True
return _cached_cloud_provider