feat(browser): add browser_dialog for native JS dialog handling

Ergonomic wrapper over CDP's Page.handleJavaScriptDialog that accepts
or dismisses alert/confirm/prompt/beforeunload dialogs blocking a page.
Unsticks pages whose JS thread is frozen by an unhandled dialog —
symptom is that browser_snapshot, browser_console, browser_click etc.
start hanging or erroring.

- action='accept'|'dismiss' required; prompt_text optional for prompt()
- target_id auto-resolves when exactly one page tab is open; with
  multiple page tabs, errors with the tab list so the agent picks one
- Shares browser_cdp's check_fn gate — only appears when CDP is
  reachable (/browser connect or browser.cdp_url in config). Hidden
  otherwise so backends that can't use it don't see it.
- Safe as a probe: CDP returns a clean 'No dialog is showing' error
  when nothing's pending, which we pass through verbatim

Dialog detection (knowing a dialog is open without being told) is NOT
included — it requires persistent CDP subscriptions per session, a
larger architectural change. Documented as a follow-up; agents infer
from symptoms and use this tool to recover.

Tests: 11 new unit tests against mock CDP server covering the wrapper
(action validation, auto-resolve with 0/1/multiple page targets,
explicit target_id accept/dismiss flow, prompt_text passthrough, shared
gate with browser_cdp, registry dispatch). E2E probe case against real
headless Chrome passes. Positive-case real-Chrome E2E is blocked by
Chromium's headless auto-dismiss behavior when no persistent listener
is attached — unit tests exercise the exact CDP protocol we send, so
the handling path is protocol-verified; headful real-browser usage
(the actual /browser connect case) keeps dialogs alive via the Chrome
UI.
This commit is contained in:
Teknium 2026-04-19 05:20:51 -07:00
parent 62ce6a38ae
commit 4b8272f549
No known key found for this signature in database
6 changed files with 467 additions and 18 deletions

View file

@ -6,9 +6,9 @@ description: "Authoritative reference for Hermes built-in tools, grouped by tool
# Built-in Tools Reference
This page documents all 53 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets.
This page documents all 54 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets.
**Quick counts:** 11 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, 5 Feishu tools, and 15 standalone tools across other toolsets.
**Quick counts:** 12 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, 5 Feishu tools, and 15 standalone tools across other toolsets.
:::tip MCP Tools
In addition to built-in tools, Hermes can load tools dynamically from MCP servers. MCP tools appear with a server-name prefix (e.g., `github_create_issue` for the `github` MCP server). See [MCP Integration](/docs/user-guide/features/mcp) for configuration.
@ -20,6 +20,7 @@ In addition to built-in tools, Hermes can load tools dynamically from MCP server
|------|-------------|----------------------|
| `browser_back` | Navigate back to the previous page in browser history. Requires browser_navigate to be called first. | — |
| `browser_cdp` | Send a raw Chrome DevTools Protocol (CDP) command. Escape hatch for browser operations not covered by browser_navigate, browser_click, browser_console, etc. Only available when a CDP endpoint is reachable at session start — via `/browser connect` or `browser.cdp_url` config. See https://chromedevtools.github.io/devtools-protocol/ | — |
| `browser_dialog` | Accept or dismiss a native JS dialog (alert/confirm/prompt/beforeunload) that's blocking a page. Auto-resolves target_id when exactly one page tab is open. Same CDP gate as browser_cdp. Safe as a probe — returns 'No dialog is showing' when nothing's pending. | — |
| `browser_click` | Click on an element identified by its ref ID from the snapshot (e.g., '@e5'). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first. | — |
| `browser_console` | Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requi… | — |
| `browser_get_images` | Get a list of all images on the current page with their URLs and alt text. Useful for finding images to analyze with the vision tool. Requires browser_navigate to be called first. | — |