hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

History

Teknium 5a1c599412 feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540 ) * docs: browser CDP supervisor design (for upcoming PR) Design doc ahead of implementation — dialog + iframe detection/interaction via a persistent CDP supervisor. Covers backend capability matrix (verified live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split, non-goals, and test plan. Supersedes #12550. No code changes in this commit. * feat(browser): add persistent CDP supervisor for dialog + frame detection Single persistent CDP WebSocket per Hermes task_id that subscribes to Page/Runtime/Target events and maintains thread-safe state for pending dialogs, frame tree, and console errors. Supervisor lives in its own daemon thread running an asyncio loop; external callers use sync API (snapshot(), respond_to_dialog()) that bridges onto the loop. Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true} and enables Page+Runtime on each so iframe-origin dialogs surface through the same supervisor. Dialog policies: must_respond (default, 300s safety timeout), auto_dismiss, auto_accept. Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot payloads bounded on ad-heavy pages. E2E verified against real Chrome via smoke test — detects + responds to main-frame alerts, iframe-contentWindow alerts, preserves frame tree, graceful no-dialog error path, clean shutdown. No agent-facing tool wiring in this commit (comes next). * feat(browser): add browser_dialog tool wired to CDP supervisor Agent-facing response-only tool. Schema: action: 'accept' \| 'dismiss' (required) prompt_text: response for prompt() dialogs (optional) dialog_id: disambiguate when multiple dialogs queued (optional) Handler: SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...) check_fn shares _browser_cdp_check with browser_cdp so both surface and hide together. When no supervisor is attached (Camofox, default Playwright, or no browser session started yet), tool is hidden; if somehow invoked it returns a clear error pointing the agent to browser_navigate / /browser connect. Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp / hermes-api-server toolsets alongside browser_cdp. * feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot Supervisor lifecycle: * _get_session_info lazy-starts the supervisor after a session row is materialized — covers every backend code path (Browserbase, cdp_url override, /browser connect, future providers) with one hook. * cleanup_browser(task_id) stops the supervisor for that task first (before the backend tears down CDP). * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all(). * /browser connect eagerly starts the supervisor for task 'default' so the first snapshot already shows pending_dialogs. * /browser disconnect stops the supervisor. CDP URL resolution for the supervisor: 1. BROWSER_CDP_URL / browser.cdp_url override. 2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase). browser_snapshot merges supervisor state (pending_dialogs + frame_tree) into its JSON output when a supervisor is active — the agent reads pending_dialogs from the snapshot it already requests, then calls browser_dialog to respond. No extra tool surface. Config defaults: * browser.dialog_policy: 'must_respond' (new) * browser.dialog_timeout_s: 300 (new) No version bump — new keys deep-merge into existing browser section. Deadlock fix in supervisor event dispatch: * _on_dialog_opening and _on_target_attached used to await CDP calls while the reader was still processing an event — but only the reader can set the response Future, so the call timed out. * Both now fire asyncio.create_task(...) so the reader stays pumping. * auto_dismiss/auto_accept now actually close the dialog immediately. Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome): * supervisor start/snapshot * main-frame alert detection + dismiss * iframe.contentWindow alert * prompt() with prompt_text reply * respond with no pending dialog -> clean error * auto_dismiss clears on event * registry idempotency * registry stop -> snapshot reports inactive * browser_dialog tool no-supervisor error * browser_dialog invalid action * browser_dialog end-to-end via tool handler xdist-safe: chrome_cdp fixture uses a per-worker port. Skipped when google-chrome/chromium isn't installed. * docs(browser): document browser_dialog tool + CDP supervisor - user-guide/features/browser.md: new browser_dialog section with workflow, availability gate, and dialog_policy table - reference/tools-reference.md: row for browser_dialog, tool count bumped 53 -> 54, browser tools count 11 -> 12 - reference/toolsets-reference.md: browser_dialog added to browser toolset row with note on pending_dialogs / frame_tree snapshot fields Full design doc lives at developer-guide/browser-supervisor.md (committed earlier). * fix(browser): reconnect loop + recent_dialogs for Browserbase visibility Found via Browserbase E2E test that revealed two production-critical issues: 1. Supervisor WebSocket drops when other clients disconnect. Browserbase's CDP proxy tears down our long-lived WebSocket whenever a short-lived client (e.g. agent-browser CLI's per-command CDP connection) disconnects. Fixed with a reconnecting _run loop that re-attaches with exponential backoff on drops. _page_session_id and _child_sessions are reset on each reconnect; pending_dialogs and frames are preserved across reconnects. 2. Browserbase auto-dismisses dialogs server-side within ~10ms. Their Playwright-based CDP proxy dismisses alert/confirm/prompt before our Page.handleJavaScriptDialog call can respond. So pending_dialogs is empty by the time the agent reads a snapshot on Browserbase. Added a recent_dialogs ring buffer (capacity 20) that retains a DialogRecord for every dialog that opened, with a closed_by tag: * 'agent' — agent called browser_dialog * 'auto_policy' — local auto_dismiss/auto_accept fired * 'watchdog' — must_respond timeout auto-dismissed (300s default) * 'remote' — browser/backend closed it on us (Browserbase) Agents on Browserbase now see the dialog history with closed_by='remote' so they at least know a dialog fired, even though they couldn't respond. 3. Page.javascriptDialogClosed matching bug. The event doesn't include a 'message' field (CDP spec has only 'result' and 'userInput') but our _on_dialog_closed was matching on message. Fixed to match by session_id + oldest-first, with a safety assumption that only one dialog is in flight per session (the JS thread is blocked while a dialog is up). Docs + tests updated: * browser.md: new availability matrix showing the three backends and which mode (pending / recent / response) each supports * developer-guide/browser-supervisor.md: three-field snapshot schema with closed_by semantics * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12 passing against real Chrome) E2E verified both backends: * Local Chrome via /browser connect: detect + respond full workflow (smoke_supervisor.py all 7 scenarios pass) * Browserbase: detect via recent_dialogs with closed_by='remote' (smoke_supervisor_browserbase_v2.py passes) Camofox remains out of scope (REST-only, no CDP) — tracked for upstream PR 3. * feat(browser): XHR bridge for dialog response on Browserbase (FIXED) Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so Page.handleJavaScriptDialog calls lose the race. Solution: bypass native dialogs entirely. The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a JavaScript override for window.alert/confirm/prompt. Those overrides perform a synchronous XMLHttpRequest to a magic host ('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable with a requestStage=Request pattern. Flow when a page calls alert('hi'): 1. window.alert override intercepts, builds XHR GET to http://hermes-dialog-bridge.invalid/?kind=alert&message=hi 2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics) 3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces it as a pending dialog with bridge_request_id set 4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog 5. Supervisor calls Fetch.fulfillRequest with JSON body: {accept: true\|false, prompt_text: '...', dialog_id: 'd-N'} 6. The injected script parses the body, returns the appropriate value from the override (undefined for alert, bool for confirm, string\|null for prompt) This works identically on Browserbase AND local Chrome — no native dialog ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog policies (must_respond / auto_dismiss / auto_accept) all still work. Bridge is installed on every attached session (main page + OOPIF child sessions) so iframe dialogs are captured too. Native-dialog path kept as a fallback for backends that don't auto-dismiss (so a page that somehow bypasses our override — e.g. iframes that load after Fetch.enable but before the init-script runs — still gets observed via Page.javascriptDialogOpening). E2E VERIFIED: * Local Chrome: 13/13 pytest tests green (12 original + new test_bridge_captures_prompt_and_returns_reply_text that asserts window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds) * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS: - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓ - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY' → page.prompt_ret === 'AGENT-REPLY' ✓ - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓ - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓ Docs updated in browser.md and developer-guide/browser-supervisor.md — availability matrix now shows Browserbase at full parity with local Chrome for both detection and response. * feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...) Adds iframe interaction to the CDP supervisor PR (was queued as PR 2). Design: browser_cdp gets an optional frame_id parameter. When set, the tool looks up the frame in the supervisor's frame_tree, grabs its child cdp_session_id (OOPIF session), and dispatches the CDP call through the supervisor's already-connected WebSocket via run_coroutine_threadsafe. Why not stateless: on Browserbase, each fresh browser_cdp WebSocket must re-negotiate against a signed connectUrl. The session info carries a specific URL that can expire while the supervisor's long-lived connection stays valid. Routing via the supervisor sidesteps this. Agent workflow: 1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true 2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>, params={'expression': 'document.title', 'returnByValue': True}) 3. Supervisor dispatches the call on the OOPIF's child session Supervisor state fixes needed along the way: * _on_frame_detached now skips reason='swap' (frame migrating processes) * _on_frame_detached also skips when the frame is an OOPIF with a live child session — Browserbase fires spurious remove events when a same-origin iframe gets promoted to OOPIF * _on_target_detached clears cdp_session_id but KEEPS the frame record so the agent still sees the OOPIF in frame_tree during transient session flaps E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py): browser_cdp(method='Runtime.evaluate', params={'expression': 'document.title', 'returnByValue': True}, frame_id=<OOPIF>) → {'success': True, 'result': {'value': 'Example Domain'}} The iframe is <iframe src='https://example.com/'> inside a top-level data: URL page on a real Browserbase session. The agent Runtime.evaluates INSIDE the cross-origin iframe and gets example.com's title back. Tests (tests/tools/test_browser_supervisor.py — 16 pass total): * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF, verifies routing via supervisor, Runtime.evaluate returns 1+1=2 * test_browser_cdp_frame_id_missing_supervisor — clean error when no supervisor attached * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad frame_id Docs (browser.md and developer-guide/browser-supervisor.md) updated with the iframe workflow, availability matrix now shows OOPIF eval as shipped for local Chrome + Browserbase. * test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process When asked 'did you test the iframe stuff' I had only done a mocked pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/ smoke_local_oopif.py: * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906) * Chrome with --site-per-process so the cross-origin iframe becomes a real OOPIF in its own process * Navigate, find OOPIF in supervisor.frame_tree, call browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes through the supervisor's child session * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the inner page, retrieved via OOPIF eval) PASSED on 2026-04-23. Tried to embed this as a pytest but hit an asyncio version quirk between venv (3.11) and the system python (3.13) — Page.navigate hangs in the pytest harness but works in standalone. Left a self-documenting skip test that points to the smoke script + describes the verification. chrome_cdp fixture now passes --site-per-process so future iframe tests can rely on OOPIF behavior. Result: 16 pass + 1 documented-skip = 17 tests in tests/tools/test_browser_supervisor.py. * docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count Pre-merge docs audit revealed two gaps: 1. user-guide/configuration.md browser config example was missing the two new dialog_* knobs. Added with a short table explaining must_respond / auto_dismiss / auto_accept semantics and a link to the feature page for the full workflow. 2. reference/tools-reference.md header said '54 built-in tools' — real count on main is 54, this branch adds browser_dialog so it's 55. Fixed the header. (browser count was already correctly bumped 11 -> 12 in the earlier docs commit.) No code changes.		2026-04-23 22:23:37 -07:00
..
__init__.py	chore: release v0.11.0 (2026.4.23) (#14791 )	2026-04-23 15:31:59 -07:00
auth.py	fix(auth): refuse to touch real auth.json during pytest; delete sandbox-escaping test (#14729 )	2026-04-23 13:50:21 -07:00
auth_commands.py	fix(auth): unify credential source removal — every source sticks (#13427 )	2026-04-21 01:52:49 -07:00
backup.py	fix(backup): handle files with pre-1980 timestamps	2026-04-20 00:47:40 -07:00
banner.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
callbacks.py	fix: ESC cancels secret/sudo prompts, clearer skip messaging (#9902 )	2026-04-14 16:11:37 -07:00
claw.py	Normalize claw workspace paths for Windows	2026-04-22 18:15:27 -07:00
cli_output.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
clipboard.py	feat: fix img pasting in new ink plus newline after tools	2026-04-11 13:14:32 -05:00
codex_models.py	feat(codex): add gpt-5.5 and wire live model discovery into picker (#14720 )	2026-04-23 13:32:43 -07:00
colors.py	feat: respect NO_COLOR env var and TERM=dumb (#4079 )	2026-03-30 17:07:21 -07:00
commands.py	feat(gateway): expose plugin slash commands natively on all platforms + decision-capable command hook	2026-04-22 16:23:21 -07:00
completion.py	fix: preserve profile name completion in dynamic shell completion	2026-04-14 10:45:42 -07:00
config.py	feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540 )	2026-04-23 22:23:37 -07:00
copilot_auth.py	fix(copilot): resolve GHE token poisoning when GITHUB_TOKEN is set	2026-04-13 05:12:36 -07:00
cron.py	feat(cron): track delivery failures in job status (#6042 )	2026-04-07 22:49:01 -07:00
curses_ui.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
debug.py	style(debug): add missing blank line between LogSnapshot and helpers	2026-04-22 16:34:05 -05:00
default_soul.py	fix: reset default SOUL.md to baseline identity text (#3159 )	2026-03-26 01:34:27 -07:00
dingtalk_auth.py	test(dingtalk): cover QR device-flow auth + OpenClaw branding disclosure	2026-04-17 05:08:07 -07:00
doctor.py	feat: add Step Plan provider support (salvage #6005 )	2026-04-22 02:59:58 -07:00
dump.py	refactor: remove smart_model_routing feature (#12732 )	2026-04-19 18:12:55 -07:00
env_loader.py	fix(cli): ensure project .env is sanitized before loading	2026-04-22 05:51:44 -07:00
gateway.py	fix(gateway): drain-aware hermes update + faster still-working pings (#14736 )	2026-04-23 14:01:57 -07:00
hooks.py	feat: shell hooks — wire shell scripts as Hermes hook callbacks	2026-04-20 20:53:51 -07:00
logs.py	feat: component-separated logging with session context and filtering (#7991 )	2026-04-11 17:23:36 -07:00
main.py	fix(gateway): drain-aware hermes update + faster still-working pings (#14736 )	2026-04-23 14:01:57 -07:00
mcp_config.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
memory_setup.py	fix(memory): discover user-installed memory providers from $HERMES_HOME/plugins/ (#10529 )	2026-04-15 14:25:40 -07:00
model_normalize.py	fix(copilot): normalize vendor-prefixed and dash-notation model IDs (#6879 ) (#11561 )	2026-04-17 04:19:36 -07:00
model_switch.py	fix: resolve_alias prefers highest version + merges static catalog	2026-04-23 23:18:33 +05:30
models.py	feat(codex): add gpt-5.5 and wire live model discovery into picker (#14720 )	2026-04-23 13:32:43 -07:00
nous_subscription.py	fix(fal): extend whitespace-only FAL_KEY handling to all call sites	2026-04-21 02:04:21 -07:00
pairing.py	fix(pairing): handle null user_name in pairing list display	2026-04-23 02:34:11 -07:00
platforms.py	feat(cron): honor `hermes tools` config for the cron platform (#14798 )	2026-04-23 15:48:50 -07:00
plugins.py	fix(image-gen): force-refresh plugin providers in long-lived sessions	2026-04-23 03:01:18 -07:00
plugins_cmd.py	feat(plugins): make all plugins opt-in by default	2026-04-20 04:46:45 -07:00
profiles.py	fix(profiles): stage profile imports to prevent directory clobbering	2026-04-23 03:02:34 -07:00
providers.py	feat: add Step Plan provider support (salvage #6005 )	2026-04-22 02:59:58 -07:00
runtime_provider.py	fix(kimi-coding): add KIMI_CODING_API_KEY fallback + api_mode detection for /coding endpoint	2026-04-21 19:48:39 -07:00
setup.py	feat: add Xiaomi MiMo v2.5-pro and v2.5 model support (#14635 )	2026-04-23 10:06:25 -07:00
skills_config.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
skills_hub.py	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 08:59:33 -05:00
skin_engine.py	fix(skins): don't inherit status_bar_* into light-mode skins	2026-04-22 13:20:02 -07:00
status.py	feat: add Step Plan provider support (salvage #6005 )	2026-04-22 02:59:58 -07:00
timeouts.py	fix(config): add stale timeout settings	2026-04-20 00:52:50 -07:00
tips.py	feat(agent): make API retry count configurable via agent.api_max_retries (#14730 )	2026-04-23 13:59:32 -07:00
tools_config.py	fix(image-gen): persist plugin provider on reconfigure	2026-04-23 01:56:09 -07:00
uninstall.py	feat(uninstall): offer to remove named profiles when uninstalling from default	2026-04-18 19:18:13 -07:00
voice.py	fix(tui): ignore SIGPIPE so stderr back-pressure can't kill the gateway	2026-04-23 16:18:15 -07:00
web_server.py	feat(dashboard): reskin extension points for themes and plugins (#14776 )	2026-04-23 15:31:01 -07:00
webhook.py	feat(webhook): direct delivery mode for zero-LLM push notifications (#12473 )	2026-04-19 05:18:19 -07:00