Two changes that go together:
1. tools/browser_tool.py — add _ensure_browser_plugins_loaded() and call
it from _get_cloud_provider() before consulting the registry. Normally
model_tools triggers discover_plugins() as an import side-effect, but
_get_cloud_provider() can be reached from contexts that haven't gone
through model_tools (standalone scripts, certain unit-test paths, the
new parity-sweep harness). Without the defensive call, the registry is
empty and _registry_get_browser_provider() returns None — silently
downgrading users to local mode when they explicitly configured a
cloud provider with no credentials yet. The behavior-parity sweep
below caught this as 4 scenario regressions (explicit-X-no-creds for
all 3 providers, and explicit-firecrawl-with-creds).
2. tests/plugins/browser/check_parity_vs_main.py — subprocess harness
that pins one Python invocation to origin/main and one to this PR's
worktree via sys.path.insert(), runs _get_cloud_provider() across a
13-scenario config matrix, and diffs the reduced shape tuple
(is_local, provider_name, is_available). Provider_name pulls from
provider.provider_name() which is the legacy CloudBrowserProvider
API and remains as a backward-compat alias on the new BrowserProvider
ABC, so the comparison is apples-to-apples regardless of class
identity.
Final result: PARITY OK across 13 scenarios. The four observable
config/credential matrices that exercise the dispatcher all match
origin/main bit-for-bit:
- no-config + no-env → local
- explicit local + any env → local
- explicit BB / BU / FC + no creds → provider returned with
is_available()==False (so dispatcher surfaces typed credentials
error; matches main exactly)
- explicit BB / BU / FC + creds → provider returned with
is_available()==True
- no-config + BU creds → Browser Use
- no-config + BB creds → Browserbase
- no-config + both → Browser Use (legacy walk first hit)
- no-config + FC only → local (firecrawl NOT in legacy walk)
- no-config + FC + BB → Browserbase (legacy walk skips firecrawl)
Per the dev skill's "behavior-parity for refactor PRs" rule — without
this subprocess sweep, 31/31 unit tests pass while the production code
path is silently broken for users who type `browser.cloud_provider:
browserbase` and run a single browser command without prior model_tools
import. Caught + fixed before push.
Mirrors tests/plugins/web/test_web_search_provider_plugins.py from PR #25182.
31 tests across 5 classes:
TestBundledPluginsRegister (8 tests)
- Three plugins register (browserbase, browser-use, firecrawl)
- Each plugin's name + display_name accessible
- get_setup_schema() returns picker-shaped dict with post_setup hook
- All three lifecycle methods (create_session, close_session,
emergency_cleanup) overridden on every plugin
TestIsAvailable (4 tests)
- browserbase needs BOTH BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID
- browserbase: api_key alone or project_id alone insufficient
- browser-use satisfied by BROWSER_USE_API_KEY
- firecrawl satisfied by FIRECRAWL_API_KEY
TestRegistryResolution (8 tests) — most valuable, locks down
pre-migration semantics:
- _resolve(None) with no creds returns None (local mode)
- _resolve('local') short-circuits to None
- _resolve('browserbase') returns provider even when unavailable
(so dispatcher surfaces typed credentials error)
- _resolve('firecrawl') same: explicit-config wins
- _resolve('unknown') falls through to auto-detect
- Legacy walk picks browser-use over browserbase
- browserbase-only configuration: browserbase wins
- **Regression**: firecrawl is NEVER auto-selected even when
single-eligible (preserves pre-migration gate; FIRECRAWL_API_KEY
shared with web firecrawl must not silently route to paid cloud
browser)
TestLegacyAbcAliases (6 tests)
- is_configured() delegates to is_available() for all three plugins
- provider_name() returns display_name for all three plugins
TestPickerIntegration (3 tests)
- _plugin_browser_providers() exposes all three plugins as rows
- Each row carries post_setup='agent_browser'
- browser_plugin_name marker matches browser_provider
All tests use real imports — no mocking of provider classes — so the
suite catches drift in the ABC, registry, picker injection, and plugin
glue layer simultaneously.
31/31 passing.