feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369)

Agents can now send arbitrary CDP commands to the browser. The tool is
gated on a reachable CDP endpoint at session start — it only appears in
the toolset when BROWSER_CDP_URL is set (from '/browser connect') or
'browser.cdp_url' is configured in config.yaml. Backends that don't
currently expose CDP to the Python side (Camofox, default local
agent-browser, cloud providers whose per-session cdp_url is not yet
surfaced) do not see the tool at all.

Tool schema description links to the CDP method reference at
https://chromedevtools.github.io/devtools-protocol/ so the agent can
web_extract specific method docs on demand.

Stateless per call. Browser-level methods (Target.*, Browser.*,
Storage.*) omit target_id. Page-level methods attach to the target
with flatten=true and dispatch the method on the returned sessionId.
Clean errors when the endpoint becomes unreachable mid-session or
the URL isn't a WebSocket.

Tests: 19 unit (mock CDP server + gate checks) + E2E against real
headless Chrome (Target.getTargets, Browser.getVersion,
Runtime.evaluate with target_id, Page.navigate + re-eval, bogus
method, bogus target_id, missing endpoint) + E2E of the check_fn
gate (tool hidden without CDP URL, visible with it, hidden again
after unset).
This commit is contained in:
Teknium 2026-04-19 00:03:10 -07:00 committed by GitHub
parent d66414a844
commit ce410521b3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 862 additions and 7 deletions

View file

@ -327,6 +327,36 @@ Check the browser console for any JavaScript errors
Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
### `browser_cdp`
Raw Chrome DevTools Protocol passthrough — the escape hatch for browser operations not covered by the other tools. Use for native dialog handling, iframe-scoped evaluation, cookie/network control, or any CDP verb the agent needs.
**Only available when a CDP endpoint is reachable at session start** — meaning `/browser connect` has attached to a running Chrome, or `browser.cdp_url` is set in `config.yaml`. The default local agent-browser mode, Camofox, and cloud providers (Browserbase, Browser Use, Firecrawl) do not currently expose CDP to this tool — cloud providers have per-session CDP URLs but live-session routing is a follow-up.
**CDP method reference:** https://chromedevtools.github.io/devtools-protocol/ — the agent can `web_extract` a specific method's page to look up parameters and return shape.
Common patterns:
```
# List tabs (browser-level, no target_id)
browser_cdp(method="Target.getTargets")
# Handle a native JS dialog on a tab
browser_cdp(method="Page.handleJavaScriptDialog",
params={"accept": true, "promptText": ""},
target_id="<tabId>")
# Evaluate JS in a specific tab
browser_cdp(method="Runtime.evaluate",
params={"expression": "document.title", "returnByValue": true},
target_id="<tabId>")
# Get all cookies
browser_cdp(method="Network.getAllCookies")
```
Browser-level methods (`Target.*`, `Browser.*`, `Storage.*`) omit `target_id`. Page-level methods (`Page.*`, `Runtime.*`, `DOM.*`, `Emulation.*`) require a `target_id` from `Target.getTargets`. Each call is independent — sessions do not persist between calls.
## Practical Examples
### Filling Out a Web Form