feat(browser): add browser_dialog for native JS dialog handling

Ergonomic wrapper over CDP's Page.handleJavaScriptDialog that accepts
or dismisses alert/confirm/prompt/beforeunload dialogs blocking a page.
Unsticks pages whose JS thread is frozen by an unhandled dialog —
symptom is that browser_snapshot, browser_console, browser_click etc.
start hanging or erroring.

- action='accept'|'dismiss' required; prompt_text optional for prompt()
- target_id auto-resolves when exactly one page tab is open; with
  multiple page tabs, errors with the tab list so the agent picks one
- Shares browser_cdp's check_fn gate — only appears when CDP is
  reachable (/browser connect or browser.cdp_url in config). Hidden
  otherwise so backends that can't use it don't see it.
- Safe as a probe: CDP returns a clean 'No dialog is showing' error
  when nothing's pending, which we pass through verbatim

Dialog detection (knowing a dialog is open without being told) is NOT
included — it requires persistent CDP subscriptions per session, a
larger architectural change. Documented as a follow-up; agents infer
from symptoms and use this tool to recover.

Tests: 11 new unit tests against mock CDP server covering the wrapper
(action validation, auto-resolve with 0/1/multiple page targets,
explicit target_id accept/dismiss flow, prompt_text passthrough, shared
gate with browser_cdp, registry dispatch). E2E probe case against real
headless Chrome passes. Positive-case real-Chrome E2E is blocked by
Chromium's headless auto-dismiss behavior when no persistent listener
is attached — unit tests exercise the exact CDP protocol we send, so
the handling path is protocol-verified; headful real-browser usage
(the actual /browser connect case) keeps dialogs alive via the Chrome
UI.
This commit is contained in:
Teknium 2026-04-19 05:20:51 -07:00
parent 62ce6a38ae
commit 4b8272f549
No known key found for this signature in database
6 changed files with 467 additions and 18 deletions

View file

@ -357,6 +357,32 @@ browser_cdp(method="Network.getAllCookies")
Browser-level methods (`Target.*`, `Browser.*`, `Storage.*`) omit `target_id`. Page-level methods (`Page.*`, `Runtime.*`, `DOM.*`, `Emulation.*`) require a `target_id` from `Target.getTargets`. Each call is independent — sessions do not persist between calls.
### `browser_dialog`
Accept or dismiss a native JS dialog (`alert`, `confirm`, `prompt`, `beforeunload`) that's blocking a page. Native dialogs freeze the page's JS thread, so `browser_snapshot`, `browser_console`, `browser_click` and related tools will hang or error until the dialog is handled.
**Same CDP gate as `browser_cdp`** — appears in the toolset when `/browser connect` is active or `browser.cdp_url` is set, and disappears otherwise.
```
# Accept (click OK / Yes / Submit)
browser_dialog(action="accept")
# Dismiss (click Cancel / No)
browser_dialog(action="dismiss")
# Fill a prompt() dialog
browser_dialog(action="accept", prompt_text="my answer")
# With multiple tabs open, specify which one
browser_dialog(action="accept", target_id="<tabId>")
```
`target_id` is auto-resolved when exactly one page tab is open. With multiple page tabs, the tool returns an error listing them so the agent can pick one explicitly.
Safe as a probe: CDP cleanly returns `"No dialog is showing"` when nothing's pending, so calling `browser_dialog(action="dismiss")` is a zero-risk way to check for a stuck dialog. If subsequent `browser_snapshot` / `browser_click` calls start hanging on a page that was working before, this is the first thing to try.
**Note on dialog detection:** Hermes does not currently auto-detect that a dialog is open — the agent infers from symptoms (calls hanging/erroring) and uses `browser_dialog` to unstick the page. Persistent dialog-event subscription is a larger architectural change (persistent CDP connections per session) and is a follow-up.
## Practical Examples
### Filling Out a Web Form