hermes-agent/tools/browser_cdp_tool.py
Teknium 5a1c599412
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)

Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.

Supersedes #12550.

No code changes in this commit.

* feat(browser): add persistent CDP supervisor for dialog + frame detection

Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.

Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.

Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.

Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.

Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.

E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.

No agent-facing tool wiring in this commit (comes next).

* feat(browser): add browser_dialog tool wired to CDP supervisor

Agent-facing response-only tool. Schema:
  action: 'accept' | 'dismiss' (required)
  prompt_text: response for prompt() dialogs (optional)
  dialog_id: disambiguate when multiple dialogs queued (optional)

Handler:
  SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)

check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.

Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.

* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot

Supervisor lifecycle:
  * _get_session_info lazy-starts the supervisor after a session row is
    materialized — covers every backend code path (Browserbase, cdp_url
    override, /browser connect, future providers) with one hook.
  * cleanup_browser(task_id) stops the supervisor for that task first
    (before the backend tears down CDP).
  * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
  * /browser connect eagerly starts the supervisor for task 'default'
    so the first snapshot already shows pending_dialogs.
  * /browser disconnect stops the supervisor.

CDP URL resolution for the supervisor:
  1. BROWSER_CDP_URL / browser.cdp_url override.
  2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).

browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.

Config defaults:
  * browser.dialog_policy: 'must_respond' (new)
  * browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.

Deadlock fix in supervisor event dispatch:
  * _on_dialog_opening and _on_target_attached used to await CDP calls
    while the reader was still processing an event — but only the reader
    can set the response Future, so the call timed out.
  * Both now fire asyncio.create_task(...) so the reader stays pumping.
  * auto_dismiss/auto_accept now actually close the dialog immediately.

Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
  * supervisor start/snapshot
  * main-frame alert detection + dismiss
  * iframe.contentWindow alert
  * prompt() with prompt_text reply
  * respond with no pending dialog -> clean error
  * auto_dismiss clears on event
  * registry idempotency
  * registry stop -> snapshot reports inactive
  * browser_dialog tool no-supervisor error
  * browser_dialog invalid action
  * browser_dialog end-to-end via tool handler

xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.

* docs(browser): document browser_dialog tool + CDP supervisor

- user-guide/features/browser.md: new browser_dialog section with
  workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
  bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
  toolset row with note on pending_dialogs / frame_tree snapshot fields

Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).

* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility

Found via Browserbase E2E test that revealed two production-critical issues:

1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
   CDP proxy tears down our long-lived WebSocket whenever a short-lived
   client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
   Fixed with a reconnecting _run loop that re-attaches with exponential
   backoff on drops. _page_session_id and _child_sessions are reset on each
   reconnect; pending_dialogs and frames are preserved across reconnects.

2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
   Playwright-based CDP proxy dismisses alert/confirm/prompt before our
   Page.handleJavaScriptDialog call can respond. So pending_dialogs is
   empty by the time the agent reads a snapshot on Browserbase.

   Added a recent_dialogs ring buffer (capacity 20) that retains a
   DialogRecord for every dialog that opened, with a closed_by tag:
     * 'agent'       — agent called browser_dialog
     * 'auto_policy' — local auto_dismiss/auto_accept fired
     * 'watchdog'    — must_respond timeout auto-dismissed (300s default)
     * 'remote'      — browser/backend closed it on us (Browserbase)

   Agents on Browserbase now see the dialog history with closed_by='remote'
   so they at least know a dialog fired, even though they couldn't respond.

3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
   'message' field (CDP spec has only 'result' and 'userInput') but our
   _on_dialog_closed was matching on message. Fixed to match by session_id
   + oldest-first, with a safety assumption that only one dialog is in
   flight per session (the JS thread is blocked while a dialog is up).

Docs + tests updated:
  * browser.md: new availability matrix showing the three backends and
    which mode (pending / recent / response) each supports
  * developer-guide/browser-supervisor.md: three-field snapshot schema
    with closed_by semantics
  * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
    passing against real Chrome)

E2E verified both backends:
  * Local Chrome via /browser connect: detect + respond full workflow
    (smoke_supervisor.py all 7 scenarios pass)
  * Browserbase: detect via recent_dialogs with closed_by='remote'
    (smoke_supervisor_browserbase_v2.py passes)

Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.

* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)

Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.

The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.

Flow when a page calls alert('hi'):
  1. window.alert override intercepts, builds XHR GET to
     http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
  2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
  3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
     it as a pending dialog with bridge_request_id set
  4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
  5. Supervisor calls Fetch.fulfillRequest with JSON body:
     {accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
  6. The injected script parses the body, returns the appropriate value
     from the override (undefined for alert, bool for confirm, string|null
     for prompt)

This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.

Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.

Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).

E2E VERIFIED:
  * Local Chrome: 13/13 pytest tests green (12 original + new
    test_bridge_captures_prompt_and_returns_reply_text that asserts
    window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
  * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
    - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
    - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
      → page.prompt_ret === 'AGENT-REPLY' ✓
    - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
    - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓

Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.

* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)

Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).

Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.

Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.

Agent workflow:
  1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
  2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
                 params={'expression': 'document.title', 'returnByValue': True})
  3. Supervisor dispatches the call on the OOPIF's child session

Supervisor state fixes needed along the way:
  * _on_frame_detached now skips reason='swap' (frame migrating processes)
  * _on_frame_detached also skips when the frame is an OOPIF with a live
    child session — Browserbase fires spurious remove events when a
    same-origin iframe gets promoted to OOPIF
  * _on_target_detached clears cdp_session_id but KEEPS the frame record
    so the agent still sees the OOPIF in frame_tree during transient
    session flaps

E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
  browser_cdp(method='Runtime.evaluate',
              params={'expression': 'document.title', 'returnByValue': True},
              frame_id=<OOPIF>)
  → {'success': True, 'result': {'value': 'Example Domain'}}

  The iframe is <iframe src='https://example.com/'> inside a top-level
  data: URL page on a real Browserbase session. The agent Runtime.evaluates
  INSIDE the cross-origin iframe and gets example.com's title back.

Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
  * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
    verifies routing via supervisor, Runtime.evaluate returns 1+1=2
  * test_browser_cdp_frame_id_missing_supervisor — clean error when no
    supervisor attached
  * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
    frame_id

Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.

* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process

When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:

  * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
  * Chrome with --site-per-process so the cross-origin iframe becomes a
    real OOPIF in its own process
  * Navigate, find OOPIF in supervisor.frame_tree, call
    browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
    through the supervisor's child session
  * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
    inner page, retrieved via OOPIF eval)

PASSED on 2026-04-23.

Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.

chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.

Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.

* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count

Pre-merge docs audit revealed two gaps:

1. user-guide/configuration.md browser config example was missing the
   two new dialog_* knobs. Added with a short table explaining
   must_respond / auto_dismiss / auto_accept semantics and a link to
   the feature page for the full workflow.

2. reference/tools-reference.md header said '54 built-in tools' — real
   count on main is 54, this branch adds browser_dialog so it's 55.
   Fixed the header.  (browser count was already correctly bumped
   11 -> 12 in the earlier docs commit.)

No code changes.
2026-04-23 22:23:37 -07:00

563 lines
22 KiB
Python

#!/usr/bin/env python3
"""
Raw Chrome DevTools Protocol (CDP) passthrough tool.
Exposes a single tool, ``browser_cdp``, that sends arbitrary CDP commands to
the browser's DevTools WebSocket endpoint. Works when a CDP URL is
configured — either via ``/browser connect`` (sets ``BROWSER_CDP_URL``) or
``browser.cdp_url`` in ``config.yaml`` — or when a CDP-backed cloud provider
session is active.
This is the escape hatch for browser operations not covered by the main
browser tool surface (``browser_navigate``, ``browser_click``,
``browser_console``, etc.) — handling native dialogs, iframe-scoped
evaluation, cookie/network control, low-level tab management, etc.
Method reference: https://chromedevtools.github.io/devtools-protocol/
"""
from __future__ import annotations
import asyncio
import json
import logging
import os
from typing import Any, Dict, Optional
from tools.registry import registry, tool_error
logger = logging.getLogger(__name__)
CDP_DOCS_URL = "https://chromedevtools.github.io/devtools-protocol/"
# ``websockets`` is a transitive dependency of hermes-agent (via fal_client
# and firecrawl-py) and is already imported by gateway/platforms/feishu.py.
# Wrap the import so a clean error surfaces if the package is ever absent.
try:
import websockets
from websockets.exceptions import WebSocketException
_WS_AVAILABLE = True
except ImportError:
websockets = None # type: ignore[assignment]
WebSocketException = Exception # type: ignore[assignment,misc]
_WS_AVAILABLE = False
# ---------------------------------------------------------------------------
# Async-from-sync bridge (matches the pattern in homeassistant_tool.py)
# ---------------------------------------------------------------------------
def _run_async(coro):
"""Run an async coroutine from a sync handler, safe inside or outside a loop."""
try:
loop = asyncio.get_running_loop()
except RuntimeError:
loop = None
if loop and loop.is_running():
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
future = pool.submit(asyncio.run, coro)
return future.result()
return asyncio.run(coro)
# ---------------------------------------------------------------------------
# Endpoint resolution
# ---------------------------------------------------------------------------
def _resolve_cdp_endpoint() -> str:
"""Return the normalized CDP WebSocket URL, or empty string if unavailable.
Delegates to ``tools.browser_tool._get_cdp_override`` so precedence stays
consistent with the rest of the browser tool surface:
1. ``BROWSER_CDP_URL`` env var (live override from ``/browser connect``)
2. ``browser.cdp_url`` in ``config.yaml``
"""
try:
from tools.browser_tool import _get_cdp_override # type: ignore[import-not-found]
return (_get_cdp_override() or "").strip()
except Exception as exc: # pragma: no cover — defensive
logger.debug("browser_cdp: failed to resolve CDP endpoint: %s", exc)
return ""
# ---------------------------------------------------------------------------
# Core CDP call
# ---------------------------------------------------------------------------
async def _cdp_call(
ws_url: str,
method: str,
params: Dict[str, Any],
target_id: Optional[str],
timeout: float,
) -> Dict[str, Any]:
"""Make a single CDP call, optionally attaching to a target first.
When ``target_id`` is provided, we call ``Target.attachToTarget`` with
``flatten=True`` to multiplex a page-level session over the same
browser-level WebSocket, then send ``method`` with that ``sessionId``.
When ``target_id`` is None, ``method`` is sent at browser level — which
works for ``Target.*``, ``Browser.*``, ``Storage.*`` and a few other
globally-scoped domains.
"""
assert websockets is not None # guarded by _WS_AVAILABLE at call-site
async with websockets.connect(
ws_url,
max_size=None, # CDP responses (e.g. DOM.getDocument) can be large
open_timeout=timeout,
close_timeout=5,
ping_interval=None, # CDP server doesn't expect pings
) as ws:
next_id = 1
session_id: Optional[str] = None
# --- Step 1: attach to target if requested ---
if target_id:
attach_id = next_id
next_id += 1
await ws.send(
json.dumps(
{
"id": attach_id,
"method": "Target.attachToTarget",
"params": {"targetId": target_id, "flatten": True},
}
)
)
deadline = asyncio.get_event_loop().time() + timeout
while True:
remaining = deadline - asyncio.get_event_loop().time()
if remaining <= 0:
raise TimeoutError(
f"Timed out attaching to target {target_id}"
)
raw = await asyncio.wait_for(ws.recv(), timeout=remaining)
msg = json.loads(raw)
if msg.get("id") == attach_id:
if "error" in msg:
raise RuntimeError(
f"Target.attachToTarget failed: {msg['error']}"
)
session_id = msg.get("result", {}).get("sessionId")
if not session_id:
raise RuntimeError(
"Target.attachToTarget did not return a sessionId"
)
break
# Ignore events (messages without "id") while waiting
# --- Step 2: dispatch the real method ---
call_id = next_id
next_id += 1
req: Dict[str, Any] = {
"id": call_id,
"method": method,
"params": params or {},
}
if session_id:
req["sessionId"] = session_id
await ws.send(json.dumps(req))
deadline = asyncio.get_event_loop().time() + timeout
while True:
remaining = deadline - asyncio.get_event_loop().time()
if remaining <= 0:
raise TimeoutError(
f"Timed out waiting for response to {method}"
)
raw = await asyncio.wait_for(ws.recv(), timeout=remaining)
msg = json.loads(raw)
if msg.get("id") == call_id:
if "error" in msg:
raise RuntimeError(f"CDP error: {msg['error']}")
return msg.get("result", {})
# Ignore events / out-of-order responses
# ---------------------------------------------------------------------------
# Public tool function
# ---------------------------------------------------------------------------
def _browser_cdp_via_supervisor(
task_id: str,
frame_id: str,
method: str,
params: Optional[Dict[str, Any]],
timeout: float,
) -> str:
"""Route a CDP call through the live supervisor session for an OOPIF frame.
Looks up the frame in the supervisor's snapshot, extracts its child
``cdp_session_id``, and dispatches ``method`` with that sessionId via
the supervisor's already-connected WebSocket (using
``asyncio.run_coroutine_threadsafe`` onto the supervisor loop).
"""
try:
from tools.browser_supervisor import SUPERVISOR_REGISTRY # type: ignore[import-not-found]
except Exception as exc: # pragma: no cover — defensive
return tool_error(
f"CDP supervisor is not available: {exc}. frame_id routing requires "
f"a running supervisor attached via /browser connect or an active "
f"Browserbase session."
)
supervisor = SUPERVISOR_REGISTRY.get(task_id)
if supervisor is None:
return tool_error(
f"No CDP supervisor is attached for task={task_id!r}. Call "
f"browser_navigate or /browser connect first so the supervisor "
f"can attach. Once attached, browser_snapshot will populate "
f"frame_tree with frame_ids you can pass here."
)
snap = supervisor.snapshot()
# Search both the top frame and the children for the requested id.
top = snap.frame_tree.get("top")
frame_info: Optional[Dict[str, Any]] = None
if top and top.get("frame_id") == frame_id:
frame_info = top
else:
for child in snap.frame_tree.get("children", []) or []:
if child.get("frame_id") == frame_id:
frame_info = child
break
if frame_info is None:
# Check the raw frames dict too (frame_tree is capped at 30 entries)
with supervisor._state_lock: # type: ignore[attr-defined]
raw = supervisor._frames.get(frame_id) # type: ignore[attr-defined]
if raw is not None:
frame_info = raw.to_dict()
if frame_info is None:
return tool_error(
f"frame_id {frame_id!r} not found in supervisor state. "
f"Call browser_snapshot to see current frame_tree."
)
child_sid = frame_info.get("session_id")
if not child_sid:
# Not an OOPIF — fall back to top-level session (evaluating at page
# scope). Same-origin iframes don't get their own sessionId; the
# agent can still use contentWindow/contentDocument from the parent.
return tool_error(
f"frame_id {frame_id!r} is not an out-of-process iframe (no "
f"dedicated CDP session). For same-origin iframes, use "
f"`browser_cdp(method='Runtime.evaluate', params={{'expression': "
f"\"document.querySelector('iframe').contentDocument.title\"}})` "
f"at the top-level page instead."
)
# Dispatch onto the supervisor's loop.
import asyncio as _asyncio
loop = supervisor._loop # type: ignore[attr-defined]
if loop is None or not loop.is_running():
return tool_error(
"CDP supervisor loop is not running. Try reconnecting with "
"/browser connect."
)
async def _do_cdp():
return await supervisor._cdp( # type: ignore[attr-defined]
method,
params or {},
session_id=child_sid,
timeout=timeout,
)
try:
fut = _asyncio.run_coroutine_threadsafe(_do_cdp(), loop)
result_msg = fut.result(timeout=timeout + 2)
except Exception as exc:
return tool_error(
f"CDP call via supervisor failed: {type(exc).__name__}: {exc}",
cdp_docs=CDP_DOCS_URL,
)
payload: Dict[str, Any] = {
"success": True,
"method": method,
"frame_id": frame_id,
"session_id": child_sid,
"result": result_msg.get("result", {}),
}
return json.dumps(payload, ensure_ascii=False)
def browser_cdp(
method: str,
params: Optional[Dict[str, Any]] = None,
target_id: Optional[str] = None,
frame_id: Optional[str] = None,
timeout: float = 30.0,
task_id: Optional[str] = None,
) -> str:
"""Send a raw CDP command. See ``CDP_DOCS_URL`` for method documentation.
Args:
method: CDP method name, e.g. ``"Target.getTargets"``.
params: Method-specific parameters; defaults to ``{}``.
target_id: Optional target/tab ID for page-level methods. When set,
we first attach to the target (``flatten=True``) and send
``method`` with the resulting ``sessionId``. Uses a fresh
stateless CDP connection.
frame_id: Optional cross-origin (OOPIF) iframe ``frame_id`` from
``browser_snapshot.frame_tree.children[]``. When set (and the
frame is an OOPIF with a live session tracked by the CDP
supervisor), routes the call through the supervisor's existing
WebSocket — which is how you Runtime.evaluate *inside* an
iframe on backends where per-call fresh CDP connections would
hit signed-URL expiry (Browserbase) or expensive reattach.
timeout: Seconds to wait for the call to complete.
task_id: Task identifier for supervisor lookup. When ``frame_id``
is set, this identifies which task's supervisor to use; the
handler will default to ``"default"`` otherwise.
Returns:
JSON string ``{"success": True, "method": ..., "result": {...}}`` on
success, or ``{"error": "..."}`` on failure.
"""
# --- Route iframe-scoped calls through the supervisor ---------------
if frame_id:
return _browser_cdp_via_supervisor(
task_id=task_id or "default",
frame_id=frame_id,
method=method,
params=params,
timeout=timeout,
)
del task_id # stateless path below
if not method or not isinstance(method, str):
return tool_error(
"'method' is required (e.g. 'Target.getTargets')",
cdp_docs=CDP_DOCS_URL,
)
if not _WS_AVAILABLE:
return tool_error(
"The 'websockets' Python package is required but not installed. "
"Install it with: pip install websockets"
)
endpoint = _resolve_cdp_endpoint()
if not endpoint:
return tool_error(
"No CDP endpoint is available. Run '/browser connect' to attach "
"to a running Chrome, or set 'browser.cdp_url' in config.yaml. "
"The Camofox backend is REST-only and does not expose CDP.",
cdp_docs=CDP_DOCS_URL,
)
if not endpoint.startswith(("ws://", "wss://")):
return tool_error(
f"CDP endpoint is not a WebSocket URL: {endpoint!r}. "
"Expected ws://... or wss://... — the /browser connect "
"resolver should have rewritten this. Check that Chrome is "
"actually listening on the debug port."
)
call_params: Dict[str, Any] = params or {}
if not isinstance(call_params, dict):
return tool_error(
f"'params' must be an object/dict, got {type(call_params).__name__}"
)
try:
safe_timeout = float(timeout) if timeout else 30.0
except (TypeError, ValueError):
safe_timeout = 30.0
safe_timeout = max(1.0, min(safe_timeout, 300.0))
try:
result = _run_async(
_cdp_call(endpoint, method, call_params, target_id, safe_timeout)
)
except asyncio.TimeoutError as exc:
return tool_error(
f"CDP call timed out after {safe_timeout}s: {exc}",
method=method,
)
except TimeoutError as exc:
return tool_error(str(exc), method=method)
except RuntimeError as exc:
return tool_error(str(exc), method=method)
except WebSocketException as exc:
return tool_error(
f"WebSocket error talking to CDP at {endpoint}: {exc}. The "
"browser may have disconnected — try '/browser connect' again.",
method=method,
)
except Exception as exc: # pragma: no cover — unexpected
logger.exception("browser_cdp unexpected error")
return tool_error(
f"Unexpected error: {type(exc).__name__}: {exc}",
method=method,
)
payload: Dict[str, Any] = {
"success": True,
"method": method,
"result": result,
}
if target_id:
payload["target_id"] = target_id
return json.dumps(payload, ensure_ascii=False)
# ---------------------------------------------------------------------------
# Registry
# ---------------------------------------------------------------------------
BROWSER_CDP_SCHEMA: Dict[str, Any] = {
"name": "browser_cdp",
"description": (
"Send a raw Chrome DevTools Protocol (CDP) command. Escape hatch for "
"browser operations not covered by browser_navigate, browser_click, "
"browser_console, etc.\n\n"
"**Requires a reachable CDP endpoint.** Available when the user has "
"run '/browser connect' to attach to a running Chrome, or when "
"'browser.cdp_url' is set in config.yaml. Not currently wired up for "
"cloud backends (Browserbase, Browser Use, Firecrawl) — those expose "
"CDP per session but live-session routing is a follow-up. Camofox is "
"REST-only and will never support CDP. If the tool is in your toolset "
"at all, a CDP endpoint is already reachable.\n\n"
f"**CDP method reference:** {CDP_DOCS_URL} — use web_extract on a "
"method's URL (e.g. '/tot/Page/#method-handleJavaScriptDialog') "
"to look up parameters and return shape.\n\n"
"**Common patterns:**\n"
"- List tabs: method='Target.getTargets', params={}\n"
"- Handle a native JS dialog: method='Page.handleJavaScriptDialog', "
"params={'accept': true, 'promptText': ''}, target_id=<tabId>\n"
"- Get all cookies: method='Network.getAllCookies', params={}\n"
"- Eval in a specific tab: method='Runtime.evaluate', "
"params={'expression': '...', 'returnByValue': true}, "
"target_id=<tabId>\n"
"- Set viewport for a tab: method='Emulation.setDeviceMetricsOverride', "
"params={'width': 1280, 'height': 720, 'deviceScaleFactor': 1, "
"'mobile': false}, target_id=<tabId>\n\n"
"**Usage rules:**\n"
"- Browser-level methods (Target.*, Browser.*, Storage.*): omit "
"target_id and frame_id.\n"
"- Page-level methods (Page.*, Runtime.*, DOM.*, Emulation.*, "
"Network.* scoped to a tab): pass target_id from Target.getTargets.\n"
"- **Cross-origin iframe scope** (Runtime.evaluate inside an OOPIF, "
"Page.* targeting a frame target, etc.): pass frame_id from the "
"browser_snapshot frame_tree output. This routes through the CDP "
"supervisor's live connection — the only reliable way on "
"Browserbase where stateless CDP calls hit signed-URL expiry.\n"
"- Each stateless call (without frame_id) is independent — sessions "
"and event subscriptions do not persist between calls. For stateful "
"workflows, prefer the dedicated browser tools or use frame_id "
"routing."
),
"parameters": {
"type": "object",
"properties": {
"method": {
"type": "string",
"description": (
"CDP method name, e.g. 'Target.getTargets', "
"'Runtime.evaluate', 'Page.handleJavaScriptDialog'."
),
},
"params": {
"type": "object",
"description": (
"Method-specific parameters as a JSON object. Omit or "
"pass {} for methods that take no parameters."
),
"additionalProperties": True,
},
"target_id": {
"type": "string",
"description": (
"Optional. Target/tab ID from Target.getTargets result "
"(each entry's 'targetId'). Use for page-level methods "
"at the top-level tab scope. Mutually exclusive with "
"frame_id."
),
},
"frame_id": {
"type": "string",
"description": (
"Optional. Out-of-process iframe (OOPIF) frame_id from "
"browser_snapshot.frame_tree.children[] where "
"is_oopif=true. When set, routes the call through the "
"CDP supervisor's live session for that iframe. "
"Essential for Runtime.evaluate inside cross-origin "
"iframes, especially on Browserbase where fresh "
"per-call CDP connections can't keep up with signed "
"URL rotation. For same-origin iframes, use parent "
"contentWindow/contentDocument from Runtime.evaluate "
"at the top-level page instead."
),
},
"timeout": {
"type": "number",
"description": (
"Timeout in seconds (default 30, max 300)."
),
"default": 30,
},
},
"required": ["method"],
},
}
def _browser_cdp_check() -> bool:
"""Availability check for browser_cdp.
The tool is only offered when the Python side can actually reach a CDP
endpoint right now — meaning a static URL is set via ``/browser connect``
(``BROWSER_CDP_URL``) or ``browser.cdp_url`` in ``config.yaml``.
Backends that do *not* currently expose CDP to us — Camofox (REST-only),
the default local agent-browser mode (Playwright hides its internal CDP
port), and cloud providers whose per-session ``cdp_url`` is not yet
surfaced — are gated out so the model doesn't see a tool that would
reliably fail. Cloud-provider CDP routing is a follow-up.
Kept in a thin wrapper so the registration statement stays at module top
level (the tool-discovery AST scan only picks up top-level
``registry.register(...)`` calls).
"""
try:
from tools.browser_tool import ( # type: ignore[import-not-found]
_get_cdp_override,
check_browser_requirements,
)
except ImportError as exc: # pragma: no cover — defensive
logger.debug("browser_cdp check: browser_tool import failed: %s", exc)
return False
if not check_browser_requirements():
return False
return bool(_get_cdp_override())
registry.register(
name="browser_cdp",
toolset="browser-cdp",
schema=BROWSER_CDP_SCHEMA,
handler=lambda args, **kw: browser_cdp(
method=args.get("method", ""),
params=args.get("params"),
target_id=args.get("target_id"),
frame_id=args.get("frame_id"),
timeout=args.get("timeout", 30.0),
task_id=kw.get("task_id"),
),
check_fn=_browser_cdp_check,
emoji="🧪",
)