mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
feat(browser): add browser_dialog for native JS dialog handling
Ergonomic wrapper over CDP's Page.handleJavaScriptDialog that accepts or dismisses alert/confirm/prompt/beforeunload dialogs blocking a page. Unsticks pages whose JS thread is frozen by an unhandled dialog — symptom is that browser_snapshot, browser_console, browser_click etc. start hanging or erroring. - action='accept'|'dismiss' required; prompt_text optional for prompt() - target_id auto-resolves when exactly one page tab is open; with multiple page tabs, errors with the tab list so the agent picks one - Shares browser_cdp's check_fn gate — only appears when CDP is reachable (/browser connect or browser.cdp_url in config). Hidden otherwise so backends that can't use it don't see it. - Safe as a probe: CDP returns a clean 'No dialog is showing' error when nothing's pending, which we pass through verbatim Dialog detection (knowing a dialog is open without being told) is NOT included — it requires persistent CDP subscriptions per session, a larger architectural change. Documented as a follow-up; agents infer from symptoms and use this tool to recover. Tests: 11 new unit tests against mock CDP server covering the wrapper (action validation, auto-resolve with 0/1/multiple page targets, explicit target_id accept/dismiss flow, prompt_text passthrough, shared gate with browser_cdp, registry dispatch). E2E probe case against real headless Chrome passes. Positive-case real-Chrome E2E is blocked by Chromium's headless auto-dismiss behavior when no persistent listener is attached — unit tests exercise the exact CDP protocol we send, so the handling path is protocol-verified; headful real-browser usage (the actual /browser connect case) keeps dialogs alive via the Chrome UI.
This commit is contained in:
parent
62ce6a38ae
commit
4b8272f549
6 changed files with 467 additions and 18 deletions
|
|
@ -52,7 +52,7 @@ Or in-session:
|
|||
|
||||
| Toolset | Tools | Purpose |
|
||||
|---------|-------|---------|
|
||||
| `browser` | `browser_back`, `browser_cdp`, `browser_click`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `web_search` | Full browser automation. Includes `web_search` as a fallback for quick lookups. `browser_cdp` is a raw CDP passthrough gated on a reachable CDP endpoint — it only appears when `/browser connect` is active or `browser.cdp_url` is set. |
|
||||
| `browser` | `browser_back`, `browser_cdp`, `browser_click`, `browser_console`, `browser_dialog`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `web_search` | Full browser automation. Includes `web_search` as a fallback for quick lookups. `browser_cdp` and `browser_dialog` share a gate on a reachable CDP endpoint — both only appear when `/browser connect` is active or `browser.cdp_url` is set. |
|
||||
| `clarify` | `clarify` | Ask the user a question when the agent needs clarification. |
|
||||
| `code_execution` | `execute_code` | Run Python scripts that call Hermes tools programmatically. |
|
||||
| `cronjob` | `cronjob` | Schedule and manage recurring tasks. |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue