feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369)

Agents can now send arbitrary CDP commands to the browser. The tool is
gated on a reachable CDP endpoint at session start — it only appears in
the toolset when BROWSER_CDP_URL is set (from '/browser connect') or
'browser.cdp_url' is configured in config.yaml. Backends that don't
currently expose CDP to the Python side (Camofox, default local
agent-browser, cloud providers whose per-session cdp_url is not yet
surfaced) do not see the tool at all.

Tool schema description links to the CDP method reference at
https://chromedevtools.github.io/devtools-protocol/ so the agent can
web_extract specific method docs on demand.

Stateless per call. Browser-level methods (Target.*, Browser.*,
Storage.*) omit target_id. Page-level methods attach to the target
with flatten=true and dispatch the method on the returned sessionId.
Clean errors when the endpoint becomes unreachable mid-session or
the URL isn't a WebSocket.

Tests: 19 unit (mock CDP server + gate checks) + E2E against real
headless Chrome (Target.getTargets, Browser.getVersion,
Runtime.evaluate with target_id, Page.navigate + re-eval, bogus
method, bogus target_id, missing endpoint) + E2E of the check_fn
gate (tool hidden without CDP URL, visible with it, hidden again
after unset).
This commit is contained in:
Teknium 2026-04-19 00:03:10 -07:00 committed by GitHub
parent d66414a844
commit ce410521b3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 862 additions and 7 deletions

View file

@ -52,7 +52,7 @@ Or in-session:
| Toolset | Tools | Purpose |
|---------|-------|---------|
| `browser` | `browser_back`, `browser_click`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `web_search` | Full browser automation. Includes `web_search` as a fallback for quick lookups. |
| `browser` | `browser_back`, `browser_cdp`, `browser_click`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `web_search` | Full browser automation. Includes `web_search` as a fallback for quick lookups. `browser_cdp` is a raw CDP passthrough gated on a reachable CDP endpoint — it only appears when `/browser connect` is active or `browser.cdp_url` is set. |
| `clarify` | `clarify` | Ask the user a question when the agent needs clarification. |
| `code_execution` | `execute_code` | Run Python scripts that call Hermes tools programmatically. |
| `cronjob` | `cronjob` | Schedule and manage recurring tasks. |