mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369)
Agents can now send arbitrary CDP commands to the browser. The tool is gated on a reachable CDP endpoint at session start — it only appears in the toolset when BROWSER_CDP_URL is set (from '/browser connect') or 'browser.cdp_url' is configured in config.yaml. Backends that don't currently expose CDP to the Python side (Camofox, default local agent-browser, cloud providers whose per-session cdp_url is not yet surfaced) do not see the tool at all. Tool schema description links to the CDP method reference at https://chromedevtools.github.io/devtools-protocol/ so the agent can web_extract specific method docs on demand. Stateless per call. Browser-level methods (Target.*, Browser.*, Storage.*) omit target_id. Page-level methods attach to the target with flatten=true and dispatch the method on the returned sessionId. Clean errors when the endpoint becomes unreachable mid-session or the URL isn't a WebSocket. Tests: 19 unit (mock CDP server + gate checks) + E2E against real headless Chrome (Target.getTargets, Browser.getVersion, Runtime.evaluate with target_id, Page.navigate + re-eval, bogus method, bogus target_id, missing endpoint) + E2E of the check_fn gate (tool hidden without CDP URL, visible with it, hidden again after unset).
This commit is contained in:
parent
d66414a844
commit
ce410521b3
6 changed files with 862 additions and 7 deletions
|
|
@ -327,6 +327,36 @@ Check the browser console for any JavaScript errors
|
|||
|
||||
Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
|
||||
|
||||
### `browser_cdp`
|
||||
|
||||
Raw Chrome DevTools Protocol passthrough — the escape hatch for browser operations not covered by the other tools. Use for native dialog handling, iframe-scoped evaluation, cookie/network control, or any CDP verb the agent needs.
|
||||
|
||||
**Only available when a CDP endpoint is reachable at session start** — meaning `/browser connect` has attached to a running Chrome, or `browser.cdp_url` is set in `config.yaml`. The default local agent-browser mode, Camofox, and cloud providers (Browserbase, Browser Use, Firecrawl) do not currently expose CDP to this tool — cloud providers have per-session CDP URLs but live-session routing is a follow-up.
|
||||
|
||||
**CDP method reference:** https://chromedevtools.github.io/devtools-protocol/ — the agent can `web_extract` a specific method's page to look up parameters and return shape.
|
||||
|
||||
Common patterns:
|
||||
|
||||
```
|
||||
# List tabs (browser-level, no target_id)
|
||||
browser_cdp(method="Target.getTargets")
|
||||
|
||||
# Handle a native JS dialog on a tab
|
||||
browser_cdp(method="Page.handleJavaScriptDialog",
|
||||
params={"accept": true, "promptText": ""},
|
||||
target_id="<tabId>")
|
||||
|
||||
# Evaluate JS in a specific tab
|
||||
browser_cdp(method="Runtime.evaluate",
|
||||
params={"expression": "document.title", "returnByValue": true},
|
||||
target_id="<tabId>")
|
||||
|
||||
# Get all cookies
|
||||
browser_cdp(method="Network.getAllCookies")
|
||||
```
|
||||
|
||||
Browser-level methods (`Target.*`, `Browser.*`, `Storage.*`) omit `target_id`. Page-level methods (`Page.*`, `Runtime.*`, `DOM.*`, `Emulation.*`) require a `target_id` from `Target.getTargets`. Each call is independent — sessions do not persist between calls.
|
||||
|
||||
## Practical Examples
|
||||
|
||||
### Filling Out a Web Form
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue