mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-01 01:51:44 +00:00
feat(browser): add browser_dialog for native JS dialog handling
Ergonomic wrapper over CDP's Page.handleJavaScriptDialog that accepts or dismisses alert/confirm/prompt/beforeunload dialogs blocking a page. Unsticks pages whose JS thread is frozen by an unhandled dialog — symptom is that browser_snapshot, browser_console, browser_click etc. start hanging or erroring. - action='accept'|'dismiss' required; prompt_text optional for prompt() - target_id auto-resolves when exactly one page tab is open; with multiple page tabs, errors with the tab list so the agent picks one - Shares browser_cdp's check_fn gate — only appears when CDP is reachable (/browser connect or browser.cdp_url in config). Hidden otherwise so backends that can't use it don't see it. - Safe as a probe: CDP returns a clean 'No dialog is showing' error when nothing's pending, which we pass through verbatim Dialog detection (knowing a dialog is open without being told) is NOT included — it requires persistent CDP subscriptions per session, a larger architectural change. Documented as a follow-up; agents infer from symptoms and use this tool to recover. Tests: 11 new unit tests against mock CDP server covering the wrapper (action validation, auto-resolve with 0/1/multiple page targets, explicit target_id accept/dismiss flow, prompt_text passthrough, shared gate with browser_cdp, registry dispatch). E2E probe case against real headless Chrome passes. Positive-case real-Chrome E2E is blocked by Chromium's headless auto-dismiss behavior when no persistent listener is attached — unit tests exercise the exact CDP protocol we send, so the handling path is protocol-verified; headful real-browser usage (the actual /browser connect case) keeps dialogs alive via the Chrome UI.
This commit is contained in:
parent
62ce6a38ae
commit
4b8272f549
6 changed files with 467 additions and 18 deletions
|
|
@ -357,6 +357,32 @@ browser_cdp(method="Network.getAllCookies")
|
|||
|
||||
Browser-level methods (`Target.*`, `Browser.*`, `Storage.*`) omit `target_id`. Page-level methods (`Page.*`, `Runtime.*`, `DOM.*`, `Emulation.*`) require a `target_id` from `Target.getTargets`. Each call is independent — sessions do not persist between calls.
|
||||
|
||||
### `browser_dialog`
|
||||
|
||||
Accept or dismiss a native JS dialog (`alert`, `confirm`, `prompt`, `beforeunload`) that's blocking a page. Native dialogs freeze the page's JS thread, so `browser_snapshot`, `browser_console`, `browser_click` and related tools will hang or error until the dialog is handled.
|
||||
|
||||
**Same CDP gate as `browser_cdp`** — appears in the toolset when `/browser connect` is active or `browser.cdp_url` is set, and disappears otherwise.
|
||||
|
||||
```
|
||||
# Accept (click OK / Yes / Submit)
|
||||
browser_dialog(action="accept")
|
||||
|
||||
# Dismiss (click Cancel / No)
|
||||
browser_dialog(action="dismiss")
|
||||
|
||||
# Fill a prompt() dialog
|
||||
browser_dialog(action="accept", prompt_text="my answer")
|
||||
|
||||
# With multiple tabs open, specify which one
|
||||
browser_dialog(action="accept", target_id="<tabId>")
|
||||
```
|
||||
|
||||
`target_id` is auto-resolved when exactly one page tab is open. With multiple page tabs, the tool returns an error listing them so the agent can pick one explicitly.
|
||||
|
||||
Safe as a probe: CDP cleanly returns `"No dialog is showing"` when nothing's pending, so calling `browser_dialog(action="dismiss")` is a zero-risk way to check for a stuck dialog. If subsequent `browser_snapshot` / `browser_click` calls start hanging on a page that was working before, this is the first thing to try.
|
||||
|
||||
**Note on dialog detection:** Hermes does not currently auto-detect that a dialog is open — the agent infers from symptoms (calls hanging/erroring) and uses `browser_dialog` to unstick the page. Persistent dialog-event subscription is a larger architectural change (persistent CDP connections per session) and is a follow-up.
|
||||
|
||||
## Practical Examples
|
||||
|
||||
### Filling Out a Web Form
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue