feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)

* docs: browser CDP supervisor design (for upcoming PR) Design doc ahead of implementation — dialog + iframe detection/interaction via a persistent CDP supervisor. Covers backend capability matrix (verified live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split, non-goals, and test plan. Supersedes #12550. No code changes in this commit. * feat(browser): add persistent CDP supervisor for dialog + frame detection Single persistent CDP WebSocket per Hermes task_id that subscribes to Page/Runtime/Target events and maintains thread-safe state for pending dialogs, frame tree, and console errors. Supervisor lives in its own daemon thread running an asyncio loop; external callers use sync API (snapshot(), respond_to_dialog()) that bridges onto the loop. Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true} and enables Page+Runtime on each so iframe-origin dialogs surface through the same supervisor. Dialog policies: must_respond (default, 300s safety timeout), auto_dismiss, auto_accept. Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot payloads bounded on ad-heavy pages. E2E verified against real Chrome via smoke test — detects + responds to main-frame alerts, iframe-contentWindow alerts, preserves frame tree, graceful no-dialog error path, clean shutdown. No agent-facing tool wiring in this commit (comes next). * feat(browser): add browser_dialog tool wired to CDP supervisor Agent-facing response-only tool. Schema: action: 'accept' | 'dismiss' (required) prompt_text: response for prompt() dialogs (optional) dialog_id: disambiguate when multiple dialogs queued (optional) Handler: SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...) check_fn shares _browser_cdp_check with browser_cdp so both surface and hide together. When no supervisor is attached (Camofox, default Playwright, or no browser session started yet), tool is hidden; if somehow invoked it returns a clear error pointing the agent to browser_navigate / /browser connect. Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp / hermes-api-server toolsets alongside browser_cdp. * feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot Supervisor lifecycle: * _get_session_info lazy-starts the supervisor after a session row is materialized — covers every backend code path (Browserbase, cdp_url override, /browser connect, future providers) with one hook. * cleanup_browser(task_id) stops the supervisor for that task first (before the backend tears down CDP). * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all(). * /browser connect eagerly starts the supervisor for task 'default' so the first snapshot already shows pending_dialogs. * /browser disconnect stops the supervisor. CDP URL resolution for the supervisor: 1. BROWSER_CDP_URL / browser.cdp_url override. 2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase). browser_snapshot merges supervisor state (pending_dialogs + frame_tree) into its JSON output when a supervisor is active — the agent reads pending_dialogs from the snapshot it already requests, then calls browser_dialog to respond. No extra tool surface. Config defaults: * browser.dialog_policy: 'must_respond' (new) * browser.dialog_timeout_s: 300 (new) No version bump — new keys deep-merge into existing browser section. Deadlock fix in supervisor event dispatch: * _on_dialog_opening and _on_target_attached used to await CDP calls while the reader was still processing an event — but only the reader can set the response Future, so the call timed out. * Both now fire asyncio.create_task(...) so the reader stays pumping. * auto_dismiss/auto_accept now actually close the dialog immediately. Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome): * supervisor start/snapshot * main-frame alert detection + dismiss * iframe.contentWindow alert * prompt() with prompt_text reply * respond with no pending dialog -> clean error * auto_dismiss clears on event * registry idempotency * registry stop -> snapshot reports inactive * browser_dialog tool no-supervisor error * browser_dialog invalid action * browser_dialog end-to-end via tool handler xdist-safe: chrome_cdp fixture uses a per-worker port. Skipped when google-chrome/chromium isn't installed. * docs(browser): document browser_dialog tool + CDP supervisor - user-guide/features/browser.md: new browser_dialog section with workflow, availability gate, and dialog_policy table - reference/tools-reference.md: row for browser_dialog, tool count bumped 53 -> 54, browser tools count 11 -> 12 - reference/toolsets-reference.md: browser_dialog added to browser toolset row with note on pending_dialogs / frame_tree snapshot fields Full design doc lives at developer-guide/browser-supervisor.md (committed earlier). * fix(browser): reconnect loop + recent_dialogs for Browserbase visibility Found via Browserbase E2E test that revealed two production-critical issues: 1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's CDP proxy tears down our long-lived WebSocket whenever a short-lived client (e.g. agent-browser CLI's per-command CDP connection) disconnects. Fixed with a reconnecting _run loop that re-attaches with exponential backoff on drops. _page_session_id and _child_sessions are reset on each reconnect; pending_dialogs and frames are preserved across reconnects. 2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their Playwright-based CDP proxy dismisses alert/confirm/prompt before our Page.handleJavaScriptDialog call can respond. So pending_dialogs is empty by the time the agent reads a snapshot on Browserbase. Added a recent_dialogs ring buffer (capacity 20) that retains a DialogRecord for every dialog that opened, with a closed_by tag: * 'agent' — agent called browser_dialog * 'auto_policy' — local auto_dismiss/auto_accept fired * 'watchdog' — must_respond timeout auto-dismissed (300s default) * 'remote' — browser/backend closed it on us (Browserbase) Agents on Browserbase now see the dialog history with closed_by='remote' so they at least know a dialog fired, even though they couldn't respond. 3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a 'message' field (CDP spec has only 'result' and 'userInput') but our _on_dialog_closed was matching on message. Fixed to match by session_id + oldest-first, with a safety assumption that only one dialog is in flight per session (the JS thread is blocked while a dialog is up). Docs + tests updated: * browser.md: new availability matrix showing the three backends and which mode (pending / recent / response) each supports * developer-guide/browser-supervisor.md: three-field snapshot schema with closed_by semantics * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12 passing against real Chrome) E2E verified both backends: * Local Chrome via /browser connect: detect + respond full workflow (smoke_supervisor.py all 7 scenarios pass) * Browserbase: detect via recent_dialogs with closed_by='remote' (smoke_supervisor_browserbase_v2.py passes) Camofox remains out of scope (REST-only, no CDP) — tracked for upstream PR 3. * feat(browser): XHR bridge for dialog response on Browserbase (FIXED) Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so Page.handleJavaScriptDialog calls lose the race. Solution: bypass native dialogs entirely. The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a JavaScript override for window.alert/confirm/prompt. Those overrides perform a synchronous XMLHttpRequest to a magic host ('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable with a requestStage=Request pattern. Flow when a page calls alert('hi'): 1. window.alert override intercepts, builds XHR GET to http://hermes-dialog-bridge.invalid/?kind=alert&message=hi 2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics) 3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces it as a pending dialog with bridge_request_id set 4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog 5. Supervisor calls Fetch.fulfillRequest with JSON body: {accept: true|false, prompt_text: '...', dialog_id: 'd-N'} 6. The injected script parses the body, returns the appropriate value from the override (undefined for alert, bool for confirm, string|null for prompt) This works identically on Browserbase AND local Chrome — no native dialog ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog policies (must_respond / auto_dismiss / auto_accept) all still work. Bridge is installed on every attached session (main page + OOPIF child sessions) so iframe dialogs are captured too. Native-dialog path kept as a fallback for backends that don't auto-dismiss (so a page that somehow bypasses our override — e.g. iframes that load after Fetch.enable but before the init-script runs — still gets observed via Page.javascriptDialogOpening). E2E VERIFIED: * Local Chrome: 13/13 pytest tests green (12 original + new test_bridge_captures_prompt_and_returns_reply_text that asserts window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds) * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS: - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓ - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY' → page.prompt_ret === 'AGENT-REPLY' ✓ - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓ - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓ Docs updated in browser.md and developer-guide/browser-supervisor.md — availability matrix now shows Browserbase at full parity with local Chrome for both detection and response. * feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...) Adds iframe interaction to the CDP supervisor PR (was queued as PR 2). Design: browser_cdp gets an optional frame_id parameter. When set, the tool looks up the frame in the supervisor's frame_tree, grabs its child cdp_session_id (OOPIF session), and dispatches the CDP call through the supervisor's already-connected WebSocket via run_coroutine_threadsafe. Why not stateless: on Browserbase, each fresh browser_cdp WebSocket must re-negotiate against a signed connectUrl. The session info carries a specific URL that can expire while the supervisor's long-lived connection stays valid. Routing via the supervisor sidesteps this. Agent workflow: 1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true 2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>, params={'expression': 'document.title', 'returnByValue': True}) 3. Supervisor dispatches the call on the OOPIF's child session Supervisor state fixes needed along the way: * _on_frame_detached now skips reason='swap' (frame migrating processes) * _on_frame_detached also skips when the frame is an OOPIF with a live child session — Browserbase fires spurious remove events when a same-origin iframe gets promoted to OOPIF * _on_target_detached clears cdp_session_id but KEEPS the frame record so the agent still sees the OOPIF in frame_tree during transient session flaps E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py): browser_cdp(method='Runtime.evaluate', params={'expression': 'document.title', 'returnByValue': True}, frame_id=<OOPIF>) → {'success': True, 'result': {'value': 'Example Domain'}} The iframe is <iframe src='https://example.com/'> inside a top-level data: URL page on a real Browserbase session. The agent Runtime.evaluates INSIDE the cross-origin iframe and gets example.com's title back. Tests (tests/tools/test_browser_supervisor.py — 16 pass total): * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF, verifies routing via supervisor, Runtime.evaluate returns 1+1=2 * test_browser_cdp_frame_id_missing_supervisor — clean error when no supervisor attached * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad frame_id Docs (browser.md and developer-guide/browser-supervisor.md) updated with the iframe workflow, availability matrix now shows OOPIF eval as shipped for local Chrome + Browserbase. * test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process When asked 'did you test the iframe stuff' I had only done a mocked pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/ smoke_local_oopif.py: * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906) * Chrome with --site-per-process so the cross-origin iframe becomes a real OOPIF in its own process * Navigate, find OOPIF in supervisor.frame_tree, call browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes through the supervisor's child session * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the inner page, retrieved via OOPIF eval) PASSED on 2026-04-23. Tried to embed this as a pytest but hit an asyncio version quirk between venv (3.11) and the system python (3.13) — Page.navigate hangs in the pytest harness but works in standalone. Left a self-documenting skip test that points to the smoke script + describes the verification. chrome_cdp fixture now passes --site-per-process so future iframe tests can rely on OOPIF behavior. Result: 16 pass + 1 documented-skip = 17 tests in tests/tools/test_browser_supervisor.py. * docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count Pre-merge docs audit revealed two gaps: 1. user-guide/configuration.md browser config example was missing the two new dialog_* knobs. Added with a short table explaining must_respond / auto_dismiss / auto_accept semantics and a link to the feature page for the full workflow. 2. reference/tools-reference.md header said '54 built-in tools' — real count on main is 54, this branch adds browser_dialog so it's 55. Fixed the header. (browser count was already correctly bumped 11 -> 12 in the earlier docs commit.) No code changes.
2026-04-25 00:51:20 +00:00 · 2026-04-23 22:23:37 -07:00 · 2026-04-23 22:23:37 -07:00 · 5a1c599412
commit 5a1c599412
parent 0f6eabb890
13 changed files with 2665 additions and 24 deletions
--- a/cli.py
+++ b/cli.py
@ -6685,6 +6685,13 @@ class HermesCLI:
                print(f"   ⚠ Port {_port} is not reachable at {cdp_url}")
            os.environ["BROWSER_CDP_URL"] = cdp_url
            # Eagerly start the CDP supervisor so pending_dialogs + frame_tree
            # show up in the next browser_snapshot.  No-op if already started.
            try:
                from tools.browser_tool import _ensure_cdp_supervisor  # type: ignore[import-not-found]
                _ensure_cdp_supervisor("default")
            except Exception:
                pass
            print()
            print("🌐 Browser connected to live Chrome via CDP")
            print(f"   Endpoint: {cdp_url}")
@ -6706,7 +6713,8 @@ class HermesCLI:
            if current:
                os.environ.pop("BROWSER_CDP_URL", None)
                try:
-                    from tools.browser_tool import cleanup_all_browsers
+                    from tools.browser_tool import cleanup_all_browsers, _stop_cdp_supervisor
                    _stop_cdp_supervisor("default")
                    cleanup_all_browsers()
                except Exception:
                    pass
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@ -466,6 +466,12 @@ DEFAULT_CONFIG = {
        "record_sessions": False,  # Auto-record browser sessions as WebM videos
        "allow_private_urls": False,  # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
        "cdp_url": "",  # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
        # CDP supervisor — dialog + frame detection via a persistent WebSocket.
        # Active only when a CDP-capable backend is attached (Browserbase or
        # local Chrome via /browser connect). See
        # website/docs/developer-guide/browser-supervisor.md.
        "dialog_policy": "must_respond",  # must_respond | auto_dismiss | auto_accept
        "dialog_timeout_s": 300,  # Safety auto-dismiss after N seconds under must_respond
        "camofox": {
            # When true, Hermes sends a stable profile-scoped userId to Camofox
            # so the server maps it to a persistent Firefox profile automatically.
--- a/tests/tools/test_browser_supervisor.py
+++ b/tests/tools/test_browser_supervisor.py
@ -0,0 +1,563 @@
 """Integration tests for tools.browser_supervisor.
 Exercises the supervisor end-to-end against a real local Chrome
 (``--remote-debugging-port``).  Skipped when Chrome is not installed
 — these are the tests that actually verify the CDP wire protocol
 works, since mock-CDP unit tests can only prove the happy paths we
 thought to model.
 Run manually:
    scripts/run_tests.sh tests/tools/test_browser_supervisor.py
 Automated: skipped in CI unless ``HERMES_E2E_BROWSER=1`` is set.
 """
 from __future__ import annotations
 import asyncio
 import base64
 import json
 import os
 import shutil
 import subprocess
 import tempfile
 import time
 import pytest
 pytestmark = pytest.mark.skipif(
    not shutil.which("google-chrome") and not shutil.which("chromium"),
    reason="Chrome/Chromium not installed",
 )
 def _find_chrome() -> str:
    for candidate in ("google-chrome", "chromium", "chromium-browser"):
        path = shutil.which(candidate)
        if path:
            return path
    pytest.skip("no Chrome binary found")
@pytest.fixture
 def chrome_cdp(worker_id):
    """Start a headless Chrome with --remote-debugging-port, yield its WS URL.
    Uses a unique port per xdist worker to avoid cross-worker collisions.
    Always launches with ``--site-per-process`` so cross-origin iframes
    become real OOPIFs (needed by the iframe interaction tests).
    """
    import socket
    # xdist worker_id is "master" in single-process mode or "gw0".."gwN" otherwise.
    if worker_id == "master":
        port_offset = 0
    else:
        port_offset = int(worker_id.lstrip("gw"))
    port = 9225 + port_offset
    profile = tempfile.mkdtemp(prefix="hermes-supervisor-test-")
    proc = subprocess.Popen(
        [
            _find_chrome(),
            f"--remote-debugging-port={port}",
            f"--user-data-dir={profile}",
            "--no-first-run",
            "--no-default-browser-check",
            "--headless=new",
            "--disable-gpu",
            "--site-per-process",  # force OOPIFs for cross-origin iframes
        ],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )
    ws_url = None
    deadline = time.monotonic() + 15
    while time.monotonic() < deadline:
        try:
            import urllib.request
            with urllib.request.urlopen(
                f"http://127.0.0.1:{port}/json/version", timeout=1
            ) as r:
                info = json.loads(r.read().decode())
                ws_url = info["webSocketDebuggerUrl"]
                break
        except Exception:
            time.sleep(0.25)
    if ws_url is None:
        proc.terminate()
        proc.wait(timeout=5)
        shutil.rmtree(profile, ignore_errors=True)
        pytest.skip("Chrome didn't expose CDP in time")
    yield ws_url, port
    proc.terminate()
    try:
        proc.wait(timeout=3)
    except Exception:
        proc.kill()
    shutil.rmtree(profile, ignore_errors=True)
 def _test_page_url() -> str:
    html = """<!doctype html>
 <html><head><title>Supervisor pytest</title></head><body>
 <h1>Supervisor pytest</h1>
 <iframe id="inner" srcdoc="<body><h2>frame-marker</h2></body>" width="400" height="100"></iframe>
 </body></html>"""
    return "data:text/html;base64," + base64.b64encode(html.encode()).decode()
 def _fire_on_page(cdp_url: str, expression: str) -> None:
    """Navigate the first page target to a data URL and fire `expression`."""
    import asyncio
    import websockets as _ws_mod
    async def run():
        async with _ws_mod.connect(cdp_url, max_size=50 * 1024 * 1024) as ws:
            next_id = [1]
            async def call(method, params=None, session_id=None):
                cid = next_id[0]
                next_id[0] += 1
                p = {"id": cid, "method": method}
                if params:
                    p["params"] = params
                if session_id:
                    p["sessionId"] = session_id
                await ws.send(json.dumps(p))
                async for raw in ws:
                    m = json.loads(raw)
                    if m.get("id") == cid:
                        return m
            targets = (await call("Target.getTargets"))["result"]["targetInfos"]
            page = next(t for t in targets if t.get("type") == "page")
            attach = await call(
                "Target.attachToTarget", {"targetId": page["targetId"], "flatten": True}
            )
            sid = attach["result"]["sessionId"]
            await call("Page.navigate", {"url": _test_page_url()}, session_id=sid)
            await asyncio.sleep(1.5)  # let the page load
            await call(
                "Runtime.evaluate",
                {"expression": expression, "returnByValue": True},
                session_id=sid,
            )
    asyncio.run(run())
@pytest.fixture
 def supervisor_registry():
    """Yield the global registry and tear down any supervisors after the test."""
    from tools.browser_supervisor import SUPERVISOR_REGISTRY
    yield SUPERVISOR_REGISTRY
    SUPERVISOR_REGISTRY.stop_all()
 def _wait_for_dialog(supervisor, timeout: float = 5.0):
    deadline = time.monotonic() + timeout
    while time.monotonic() < deadline:
        snap = supervisor.snapshot()
        if snap.pending_dialogs:
            return snap.pending_dialogs
        time.sleep(0.1)
    return ()
 def test_supervisor_start_and_snapshot(chrome_cdp, supervisor_registry):
    """Supervisor attaches, exposes an active snapshot with a top frame."""
    cdp_url, _port = chrome_cdp
    supervisor = supervisor_registry.get_or_start(task_id="pytest-1", cdp_url=cdp_url)
    # Navigate so the frame tree populates.
    _fire_on_page(cdp_url, "/* no dialog */ void 0")
    # Give a moment for frame events to propagate
    time.sleep(1.0)
    snap = supervisor.snapshot()
    assert snap.active is True
    assert snap.task_id == "pytest-1"
    assert snap.pending_dialogs == ()
    # At minimum a top frame should exist after the navigate.
    assert snap.frame_tree.get("top") is not None
 def test_main_frame_alert_detection_and_dismiss(chrome_cdp, supervisor_registry):
    """alert() in the main frame surfaces and can be dismissed via the sync API."""
    cdp_url, _port = chrome_cdp
    supervisor = supervisor_registry.get_or_start(task_id="pytest-2", cdp_url=cdp_url)
    _fire_on_page(cdp_url, "setTimeout(() => alert('PYTEST-MAIN-ALERT'), 50)")
    dialogs = _wait_for_dialog(supervisor)
    assert dialogs, "no dialog detected"
    d = dialogs[0]
    assert d.type == "alert"
    assert "PYTEST-MAIN-ALERT" in d.message
    result = supervisor.respond_to_dialog("dismiss")
    assert result["ok"] is True
    # State cleared after dismiss
    time.sleep(0.3)
    assert supervisor.snapshot().pending_dialogs == ()
 def test_iframe_contentwindow_alert(chrome_cdp, supervisor_registry):
    """alert() fired from inside a same-origin iframe surfaces too."""
    cdp_url, _port = chrome_cdp
    supervisor = supervisor_registry.get_or_start(task_id="pytest-3", cdp_url=cdp_url)
    _fire_on_page(
        cdp_url,
        "setTimeout(() => document.querySelector('#inner').contentWindow.alert('PYTEST-IFRAME'), 50)",
    )
    dialogs = _wait_for_dialog(supervisor)
    assert dialogs, "no iframe dialog detected"
    assert any("PYTEST-IFRAME" in d.message for d in dialogs)
    result = supervisor.respond_to_dialog("accept")
    assert result["ok"] is True
 def test_prompt_dialog_with_response_text(chrome_cdp, supervisor_registry):
    """prompt() gets our prompt_text back inside the page."""
    cdp_url, _port = chrome_cdp
    supervisor = supervisor_registry.get_or_start(task_id="pytest-4", cdp_url=cdp_url)
    # Fire a prompt and stash the answer on window
    _fire_on_page(
        cdp_url,
        "setTimeout(() => { window.__promptResult = prompt('give me a token', 'default-x'); }, 50)",
    )
    dialogs = _wait_for_dialog(supervisor)
    assert dialogs
    d = dialogs[0]
    assert d.type == "prompt"
    assert d.default_prompt == "default-x"
    result = supervisor.respond_to_dialog("accept", prompt_text="PYTEST-PROMPT-REPLY")
    assert result["ok"] is True
 def test_respond_with_no_pending_dialog_errors_cleanly(chrome_cdp, supervisor_registry):
    """Calling respond_to_dialog when nothing is pending returns a clean error, not an exception."""
    cdp_url, _port = chrome_cdp
    supervisor = supervisor_registry.get_or_start(task_id="pytest-5", cdp_url=cdp_url)
    result = supervisor.respond_to_dialog("accept")
    assert result["ok"] is False
    assert "no dialog" in result["error"].lower()
 def test_auto_dismiss_policy(chrome_cdp, supervisor_registry):
    """auto_dismiss policy clears dialogs without the agent responding."""
    from tools.browser_supervisor import DIALOG_POLICY_AUTO_DISMISS
    cdp_url, _port = chrome_cdp
    supervisor = supervisor_registry.get_or_start(
        task_id="pytest-6",
        cdp_url=cdp_url,
        dialog_policy=DIALOG_POLICY_AUTO_DISMISS,
    )
    _fire_on_page(cdp_url, "setTimeout(() => alert('PYTEST-AUTO-DISMISS'), 50)")
    # Give the supervisor a moment to see + auto-dismiss
    time.sleep(2.0)
    snap = supervisor.snapshot()
    # Nothing pending because auto-dismiss cleared it immediately
    assert snap.pending_dialogs == ()
 def test_registry_idempotent_get_or_start(chrome_cdp, supervisor_registry):
    """Calling get_or_start twice with the same (task, url) returns the same instance."""
    cdp_url, _port = chrome_cdp
    a = supervisor_registry.get_or_start(task_id="pytest-idem", cdp_url=cdp_url)
    b = supervisor_registry.get_or_start(task_id="pytest-idem", cdp_url=cdp_url)
    assert a is b
 def test_registry_stop(chrome_cdp, supervisor_registry):
    """stop() tears down the supervisor and snapshot reports inactive."""
    cdp_url, _port = chrome_cdp
    supervisor = supervisor_registry.get_or_start(task_id="pytest-stop", cdp_url=cdp_url)
    assert supervisor.snapshot().active is True
    supervisor_registry.stop("pytest-stop")
    # Post-stop snapshot reports inactive; supervisor obj may still exist
    assert supervisor.snapshot().active is False
 def test_browser_dialog_tool_no_supervisor():
    """browser_dialog returns a clear error when no supervisor is attached."""
    from tools.browser_dialog_tool import browser_dialog
    r = json.loads(browser_dialog(action="accept", task_id="nonexistent-task"))
    assert r["success"] is False
    assert "No CDP supervisor" in r["error"]
 def test_browser_dialog_invalid_action(chrome_cdp, supervisor_registry):
    """browser_dialog rejects actions that aren't accept/dismiss."""
    from tools.browser_dialog_tool import browser_dialog
    cdp_url, _port = chrome_cdp
    supervisor_registry.get_or_start(task_id="pytest-bad-action", cdp_url=cdp_url)
    r = json.loads(browser_dialog(action="eat", task_id="pytest-bad-action"))
    assert r["success"] is False
    assert "accept" in r["error"] and "dismiss" in r["error"]
 def test_recent_dialogs_ring_buffer(chrome_cdp, supervisor_registry):
    """Closed dialogs show up in recent_dialogs with a closed_by tag."""
    from tools.browser_supervisor import DIALOG_POLICY_AUTO_DISMISS
    cdp_url, _port = chrome_cdp
    sv = supervisor_registry.get_or_start(
        task_id="pytest-recent",
        cdp_url=cdp_url,
        dialog_policy=DIALOG_POLICY_AUTO_DISMISS,
    )
    _fire_on_page(cdp_url, "setTimeout(() => alert('PYTEST-RECENT'), 50)")
    # Wait for auto-dismiss to cycle the dialog through
    deadline = time.time() + 5
    while time.time() < deadline:
        recent = sv.snapshot().recent_dialogs
        if recent and any("PYTEST-RECENT" in r.message for r in recent):
            break
        time.sleep(0.1)
    recent = sv.snapshot().recent_dialogs
    assert recent, "recent_dialogs should contain the auto-dismissed dialog"
    match = next((r for r in recent if "PYTEST-RECENT" in r.message), None)
    assert match is not None
    assert match.type == "alert"
    assert match.closed_by == "auto_policy"
    assert match.closed_at >= match.opened_at
 def test_browser_dialog_tool_end_to_end(chrome_cdp, supervisor_registry):
    """Full agent-path check: fire an alert, call the tool handler directly."""
    from tools.browser_dialog_tool import browser_dialog
    cdp_url, _port = chrome_cdp
    supervisor = supervisor_registry.get_or_start(task_id="pytest-tool", cdp_url=cdp_url)
    _fire_on_page(cdp_url, "setTimeout(() => alert('PYTEST-TOOL-END2END'), 50)")
    assert _wait_for_dialog(supervisor), "no dialog detected via wait_for_dialog"
    r = json.loads(browser_dialog(action="dismiss", task_id="pytest-tool"))
    assert r["success"] is True
    assert r["action"] == "dismiss"
    assert "PYTEST-TOOL-END2END" in r["dialog"]["message"]
 def test_browser_cdp_frame_id_routes_via_supervisor(chrome_cdp, supervisor_registry, monkeypatch):
    """browser_cdp(frame_id=...) routes Runtime.evaluate through supervisor.
    Mocks the supervisor with a known frame and verifies browser_cdp sends
    the call via the supervisor's loop rather than opening a stateless
    WebSocket. This is the path that makes cross-origin iframe eval work
    on Browserbase.
    """
    cdp_url, _port = chrome_cdp
    sv = supervisor_registry.get_or_start(task_id="frame-id-test", cdp_url=cdp_url)
    assert sv.snapshot().active
    # Inject a fake OOPIF frame pointing at the SUPERVISOR's own page session
    # so we can verify routing. We fake is_oopif=True so the code path
    # treats it as an OOPIF child.
    import tools.browser_supervisor as _bs
    with sv._state_lock:
        fake_frame_id = "FAKE-FRAME-001"
        sv._frames[fake_frame_id] = _bs.FrameInfo(
            frame_id=fake_frame_id,
            url="fake://",
            origin="",
            parent_frame_id=None,
            is_oopif=True,
            cdp_session_id=sv._page_session_id,  # route at page scope
        )
    # Route the tool through the supervisor. Should succeed and return
    # something that clearly came from CDP.
    from tools.browser_cdp_tool import browser_cdp
    result = browser_cdp(
        method="Runtime.evaluate",
        params={"expression": "1 + 1", "returnByValue": True},
        frame_id=fake_frame_id,
        task_id="frame-id-test",
    )
    r = json.loads(result)
    assert r.get("success") is True, f"expected success, got: {r}"
    assert r.get("frame_id") == fake_frame_id
    assert r.get("session_id") == sv._page_session_id
    value = r.get("result", {}).get("result", {}).get("value")
    assert value == 2, f"expected 2, got {value!r}"
 def test_browser_cdp_frame_id_real_oopif_smoke_documented():
    """Document that real-OOPIF E2E was manually verified — see PR #14540.
    A pytest version of this hits an asyncio version-quirk in the venv
    (3.11) that doesn't show up in standalone scripts (3.13 + system
    websockets). The mechanism IS verified end-to-end by two separate
    smoke scripts in /tmp/dialog-iframe-test/:
      * smoke_local_oopif.py   — local Chrome + 2 http servers on
        different hostnames + --site-per-process. Outer page on
        localhost:18905, iframe src=http://127.0.0.1:18906. Calls
        browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) and
        verifies inner page's title comes back from the OOPIF session.
        PASSED on 2026-04-23: iframe document.title = 'INNER-FRAME-XYZ'
      * smoke_bb_iframe_agent_path.py — Browserbase + real cross-origin
        iframe (src=https://example.com/). Same browser_cdp(frame_id=)
        path. PASSED on 2026-04-23: iframe document.title =
        'Example Domain'
    The test_browser_cdp_frame_id_routes_via_supervisor pytest covers
    the supervisor-routing plumbing with a fake injected OOPIF.
    """
    pytest.skip(
        "Real-OOPIF E2E verified manually with smoke_local_oopif.py and "
        "smoke_bb_iframe_agent_path.py — pytest version hits an asyncio "
        "version quirk between venv (3.11) and standalone (3.13). "
        "Smoke logs preserved in /tmp/dialog-iframe-test/."
    )
 def test_browser_cdp_frame_id_missing_supervisor():
    """browser_cdp(frame_id=...) errors cleanly when no supervisor is attached."""
    from tools.browser_cdp_tool import browser_cdp
    result = browser_cdp(
        method="Runtime.evaluate",
        params={"expression": "1"},
        frame_id="any-frame-id",
        task_id="no-such-task",
    )
    r = json.loads(result)
    assert r.get("success") is not True
    assert "supervisor" in (r.get("error") or "").lower()
 def test_browser_cdp_frame_id_not_in_frame_tree(chrome_cdp, supervisor_registry):
    """browser_cdp(frame_id=...) errors when the frame_id isn't known."""
    cdp_url, _port = chrome_cdp
    sv = supervisor_registry.get_or_start(task_id="bad-frame-test", cdp_url=cdp_url)
    assert sv.snapshot().active
    from tools.browser_cdp_tool import browser_cdp
    result = browser_cdp(
        method="Runtime.evaluate",
        params={"expression": "1"},
        frame_id="nonexistent-frame",
        task_id="bad-frame-test",
    )
    r = json.loads(result)
    assert r.get("success") is not True
    assert "not found" in (r.get("error") or "").lower()
 def test_bridge_captures_prompt_and_returns_reply_text(chrome_cdp, supervisor_registry):
    """End-to-end: agent's prompt_text round-trips INTO the page's JS.
    Proves the bridge isn't just catching dialogs — it's properly round-
    tripping our reply back into the page via Fetch.fulfillRequest, so
    ``prompt()`` actually returns the agent-supplied string to the page.
    """
    import base64 as _b64
    cdp_url, _port = chrome_cdp
    sv = supervisor_registry.get_or_start(task_id="pytest-bridge-prompt", cdp_url=cdp_url)
    # Page fires prompt and stashes the return value on window.
    html = """<!doctype html><html><body><script>
      window.__ret = null;
      setTimeout(() => { window.__ret = prompt('PROMPT-MSG', 'default'); }, 50);
    </script></body></html>"""
    url = "data:text/html;base64," + _b64.b64encode(html.encode()).decode()
    import asyncio as _asyncio
    import websockets as _ws_mod
    async def nav_and_read():
        async with _ws_mod.connect(cdp_url, max_size=50 * 1024 * 1024) as ws:
            nid = [1]
            pending: dict = {}
            async def reader_fn():
                try:
                    async for raw in ws:
                        m = json.loads(raw)
                        if "id" in m:
                            fut = pending.pop(m["id"], None)
                            if fut and not fut.done():
                                fut.set_result(m)
                except Exception:
                    pass
            rd = _asyncio.create_task(reader_fn())
            async def call(method, params=None, sid=None):
                c = nid[0]; nid[0] += 1
                p = {"id": c, "method": method}
                if params: p["params"] = params
                if sid: p["sessionId"] = sid
                fut = _asyncio.get_event_loop().create_future()
                pending[c] = fut
                await ws.send(json.dumps(p))
                return await _asyncio.wait_for(fut, timeout=20)
            try:
                t = (await call("Target.getTargets"))["result"]["targetInfos"]
                pg = next(x for x in t if x.get("type") == "page")
                a = await call("Target.attachToTarget", {"targetId": pg["targetId"], "flatten": True})
                sid = a["result"]["sessionId"]
                # Fire navigate but don't await — prompt() blocks the page
                nav_id = nid[0]; nid[0] += 1
                nav_fut = _asyncio.get_event_loop().create_future()
                pending[nav_id] = nav_fut
                await ws.send(json.dumps({"id": nav_id, "method": "Page.navigate", "params": {"url": url}, "sessionId": sid}))
                # Wait for supervisor to see the prompt
                deadline = time.monotonic() + 10
                dialog = None
                while time.monotonic() < deadline:
                    snap = sv.snapshot()
                    if snap.pending_dialogs:
                        dialog = snap.pending_dialogs[0]
                        break
                    await _asyncio.sleep(0.05)
                assert dialog is not None, "no dialog captured"
                assert dialog.bridge_request_id is not None, "expected bridge path"
                assert dialog.type == "prompt"
                # Agent responds
                resp = sv.respond_to_dialog("accept", prompt_text="AGENT-SUPPLIED-REPLY")
                assert resp["ok"] is True
                # Wait for nav to complete + read back
                try:
                    await _asyncio.wait_for(nav_fut, timeout=10)
                except Exception:
                    pass
                await _asyncio.sleep(0.5)
                r = await call(
                    "Runtime.evaluate",
                    {"expression": "window.__ret", "returnByValue": True},
                    sid=sid,
                )
                return r.get("result", {}).get("result", {}).get("value")
            finally:
                rd.cancel()
                try: await rd
                except BaseException: pass
    value = asyncio.run(nav_and_read())
    assert value == "AGENT-SUPPLIED-REPLY", f"expected AGENT-SUPPLIED-REPLY, got {value!r}"
--- a/tools/browser_cdp_tool.py
+++ b/tools/browser_cdp_tool.py
@ -188,10 +188,116 @@ async def _cdp_call(
 # ---------------------------------------------------------------------------
 def _browser_cdp_via_supervisor(
    task_id: str,
    frame_id: str,
    method: str,
    params: Optional[Dict[str, Any]],
    timeout: float,
 ) -> str:
    """Route a CDP call through the live supervisor session for an OOPIF frame.
    Looks up the frame in the supervisor's snapshot, extracts its child
    ``cdp_session_id``, and dispatches ``method`` with that sessionId via
    the supervisor's already-connected WebSocket (using
    ``asyncio.run_coroutine_threadsafe`` onto the supervisor loop).
    """
    try:
        from tools.browser_supervisor import SUPERVISOR_REGISTRY  # type: ignore[import-not-found]
    except Exception as exc:  # pragma: no cover — defensive
        return tool_error(
            f"CDP supervisor is not available: {exc}. frame_id routing requires "
            f"a running supervisor attached via /browser connect or an active "
            f"Browserbase session."
        )
    supervisor = SUPERVISOR_REGISTRY.get(task_id)
    if supervisor is None:
        return tool_error(
            f"No CDP supervisor is attached for task={task_id!r}. Call "
            f"browser_navigate or /browser connect first so the supervisor "
            f"can attach. Once attached, browser_snapshot will populate "
            f"frame_tree with frame_ids you can pass here."
        )
    snap = supervisor.snapshot()
    # Search both the top frame and the children for the requested id.
    top = snap.frame_tree.get("top")
    frame_info: Optional[Dict[str, Any]] = None
    if top and top.get("frame_id") == frame_id:
        frame_info = top
    else:
        for child in snap.frame_tree.get("children", []) or []:
            if child.get("frame_id") == frame_id:
                frame_info = child
                break
    if frame_info is None:
        # Check the raw frames dict too (frame_tree is capped at 30 entries)
        with supervisor._state_lock:  # type: ignore[attr-defined]
            raw = supervisor._frames.get(frame_id)  # type: ignore[attr-defined]
        if raw is not None:
            frame_info = raw.to_dict()
    if frame_info is None:
        return tool_error(
            f"frame_id {frame_id!r} not found in supervisor state. "
            f"Call browser_snapshot to see current frame_tree."
        )
    child_sid = frame_info.get("session_id")
    if not child_sid:
        # Not an OOPIF — fall back to top-level session (evaluating at page
        # scope).  Same-origin iframes don't get their own sessionId; the
        # agent can still use contentWindow/contentDocument from the parent.
        return tool_error(
            f"frame_id {frame_id!r} is not an out-of-process iframe (no "
            f"dedicated CDP session). For same-origin iframes, use "
            f"`browser_cdp(method='Runtime.evaluate', params={{'expression': "
            f"\"document.querySelector('iframe').contentDocument.title\"}})` "
            f"at the top-level page instead."
        )
    # Dispatch onto the supervisor's loop.
    import asyncio as _asyncio
    loop = supervisor._loop  # type: ignore[attr-defined]
    if loop is None or not loop.is_running():
        return tool_error(
            "CDP supervisor loop is not running. Try reconnecting with "
            "/browser connect."
        )
    async def _do_cdp():
        return await supervisor._cdp(  # type: ignore[attr-defined]
            method,
            params or {},
            session_id=child_sid,
            timeout=timeout,
        )
    try:
        fut = _asyncio.run_coroutine_threadsafe(_do_cdp(), loop)
        result_msg = fut.result(timeout=timeout + 2)
    except Exception as exc:
        return tool_error(
            f"CDP call via supervisor failed: {type(exc).__name__}: {exc}",
            cdp_docs=CDP_DOCS_URL,
        )
    payload: Dict[str, Any] = {
        "success": True,
        "method": method,
        "frame_id": frame_id,
        "session_id": child_sid,
        "result": result_msg.get("result", {}),
    }
    return json.dumps(payload, ensure_ascii=False)
 def browser_cdp(
    method: str,
    params: Optional[Dict[str, Any]] = None,
    target_id: Optional[str] = None,
    frame_id: Optional[str] = None,
    timeout: float = 30.0,
    task_id: Optional[str] = None,
 ) -> str:
@ -202,16 +308,34 @@ def browser_cdp(
        params: Method-specific parameters; defaults to ``{}``.
        target_id: Optional target/tab ID for page-level methods.  When set,
            we first attach to the target (``flatten=True``) and send
-            ``method`` with the resulting ``sessionId``.
+            ``method`` with the resulting ``sessionId``.  Uses a fresh
            stateless CDP connection.
        frame_id: Optional cross-origin (OOPIF) iframe ``frame_id`` from
            ``browser_snapshot.frame_tree.children[]``.  When set (and the
            frame is an OOPIF with a live session tracked by the CDP
            supervisor), routes the call through the supervisor's existing
            WebSocket — which is how you Runtime.evaluate *inside* an
            iframe on backends where per-call fresh CDP connections would
            hit signed-URL expiry (Browserbase) or expensive reattach.
        timeout: Seconds to wait for the call to complete.
-        task_id: Unused (tool is stateless) — accepted for uniformity with
+        task_id: Task identifier for supervisor lookup.  When ``frame_id``
-            other browser tools.
+            is set, this identifies which task's supervisor to use; the
            handler will default to ``"default"`` otherwise.
    Returns:
        JSON string ``{"success": True, "method": ..., "result": {...}}`` on
        success, or ``{"error": "..."}`` on failure.
    """
-    del task_id  # unused — stateless
+    # --- Route iframe-scoped calls through the supervisor ---------------
    if frame_id:
        return _browser_cdp_via_supervisor(
            task_id=task_id or "default",
            frame_id=frame_id,
            method=method,
            params=params,
            timeout=timeout,
        )
    del task_id  # stateless path below
    if not method or not isinstance(method, str):
        return tool_error(
@ -324,12 +448,18 @@ BROWSER_CDP_SCHEMA: Dict[str, Any] = {
        "'mobile': false}, target_id=<tabId>\n\n"
        "**Usage rules:**\n"
        "- Browser-level methods (Target.*, Browser.*, Storage.*): omit "
-        "target_id.\n"
+        "target_id and frame_id.\n"
        "- Page-level methods (Page.*, Runtime.*, DOM.*, Emulation.*, "
        "Network.* scoped to a tab): pass target_id from Target.getTargets.\n"
-        "- Each call is independent — sessions and event subscriptions do "
+        "- **Cross-origin iframe scope** (Runtime.evaluate inside an OOPIF, "
-        "not persist between calls. For stateful workflows, prefer the "
+        "Page.* targeting a frame target, etc.): pass frame_id from the "
-        "dedicated browser tools."
+        "browser_snapshot frame_tree output. This routes through the CDP "
        "supervisor's live connection — the only reliable way on "
        "Browserbase where stateless CDP calls hit signed-URL expiry.\n"
        "- Each stateless call (without frame_id) is independent — sessions "
        "and event subscriptions do not persist between calls. For stateful "
        "workflows, prefer the dedicated browser tools or use frame_id "
        "routing."
    ),
    "parameters": {
        "type": "object",
@ -353,8 +483,24 @@ BROWSER_CDP_SCHEMA: Dict[str, Any] = {
                "type": "string",
                "description": (
                    "Optional. Target/tab ID from Target.getTargets result "
-                    "(each entry's 'targetId'). Required for page-level "
+                    "(each entry's 'targetId'). Use for page-level methods "
-                    "methods; must be omitted for browser-level methods."
+                    "at the top-level tab scope. Mutually exclusive with "
                    "frame_id."
                ),
            },
            "frame_id": {
                "type": "string",
                "description": (
                    "Optional. Out-of-process iframe (OOPIF) frame_id from "
                    "browser_snapshot.frame_tree.children[] where "
                    "is_oopif=true. When set, routes the call through the "
                    "CDP supervisor's live session for that iframe. "
                    "Essential for Runtime.evaluate inside cross-origin "
                    "iframes, especially on Browserbase where fresh "
                    "per-call CDP connections can't keep up with signed "
                    "URL rotation. For same-origin iframes, use parent "
                    "contentWindow/contentDocument from Runtime.evaluate "
                    "at the top-level page instead."
                ),
            },
            "timeout": {
@ -408,6 +554,7 @@ registry.register(
        method=args.get("method", ""),
        params=args.get("params"),
        target_id=args.get("target_id"),
        frame_id=args.get("frame_id"),
        timeout=args.get("timeout", 30.0),
        task_id=kw.get("task_id"),
    ),
--- a/tools/browser_dialog_tool.py
+++ b/tools/browser_dialog_tool.py
@ -0,0 +1,148 @@
 """Agent-facing tool: respond to a native JS dialog captured by the CDP supervisor.
 This tool is response-only — the agent first reads ``pending_dialogs`` from
 ``browser_snapshot`` output, then calls ``browser_dialog(action=...)`` to
 accept or dismiss.
 Gated on the same ``_browser_cdp_check`` as ``browser_cdp`` so it only
 appears when a CDP endpoint is reachable (Browserbase with a
 ``connectUrl``, local Chrome via ``/browser connect``, or
 ``browser.cdp_url`` set in config).
 See ``website/docs/developer-guide/browser-supervisor.md`` for the full
 design.
 """
 from __future__ import annotations
 import json
 import logging
 from typing import Any, Dict, Optional
 from tools.browser_supervisor import SUPERVISOR_REGISTRY
 from tools.registry import registry
 logger = logging.getLogger(__name__)
 BROWSER_DIALOG_SCHEMA: Dict[str, Any] = {
    "name": "browser_dialog",
    "description": (
        "Respond to a native JavaScript dialog (alert / confirm / prompt / "
        "beforeunload) that is currently blocking the page.\n\n"
        "**Workflow:** call ``browser_snapshot`` first — if a dialog is open, "
        "it appears in the ``pending_dialogs`` field with ``id``, ``type``, "
        "and ``message``. Then call this tool with ``action='accept'`` or "
        "``action='dismiss'``.\n\n"
        "**Prompt dialogs:** pass ``prompt_text`` to supply the response "
        "string. Ignored for alert/confirm/beforeunload.\n\n"
        "**Multiple dialogs:** if more than one dialog is queued (rare — "
        "happens when a second dialog fires while the first is still open), "
        "pass ``dialog_id`` from the snapshot to disambiguate.\n\n"
        "**Availability:** only present when a CDP-capable backend is "
        "attached — Browserbase sessions, local Chrome via "
        "``/browser connect``, or ``browser.cdp_url`` in config.yaml. "
        "Not available on Camofox (REST-only) or the default Playwright "
        "local browser (CDP port is hidden)."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "action": {
                "type": "string",
                "enum": ["accept", "dismiss"],
                "description": (
                    "'accept' clicks OK / returns the prompt text. "
                    "'dismiss' clicks Cancel / returns null from prompt(). "
                    "For ``beforeunload`` dialogs: 'accept' allows the "
                    "navigation, 'dismiss' keeps the page."
                ),
            },
            "prompt_text": {
                "type": "string",
                "description": (
                    "Response string for a ``prompt()`` dialog. Ignored for "
                    "other dialog types. Defaults to empty string."
                ),
            },
            "dialog_id": {
                "type": "string",
                "description": (
                    "Specific dialog to respond to, from "
                    "``browser_snapshot.pending_dialogs[].id``. Required "
                    "only when multiple dialogs are queued."
                ),
            },
        },
        "required": ["action"],
    },
 }
 def browser_dialog(
    action: str,
    prompt_text: Optional[str] = None,
    dialog_id: Optional[str] = None,
    task_id: Optional[str] = None,
 ) -> str:
    """Respond to a pending dialog on the active task's CDP supervisor."""
    effective_task_id = task_id or "default"
    supervisor = SUPERVISOR_REGISTRY.get(effective_task_id)
    if supervisor is None:
        return json.dumps(
            {
                "success": False,
                "error": (
                    "No CDP supervisor is attached to this task. Either the "
                    "browser backend doesn't expose CDP (Camofox, default "
                    "Playwright) or no browser session has been started yet. "
                    "Call browser_navigate or /browser connect first."
                ),
            }
        )
    result = supervisor.respond_to_dialog(
        action=action,
        prompt_text=prompt_text,
        dialog_id=dialog_id,
    )
    if result.get("ok"):
        return json.dumps(
            {
                "success": True,
                "action": action,
                "dialog": result.get("dialog", {}),
            }
        )
    return json.dumps({"success": False, "error": result.get("error", "unknown error")})
 def _browser_dialog_check() -> bool:
    """Gate: same as ``browser_cdp`` — only offered when CDP is reachable.
    Kept identical so the two tools appear and disappear together. The
    supervisor itself is started lazily by ``browser_navigate`` /
    ``/browser connect`` / Browserbase session creation, so a reachable
    CDP URL is enough to commit to showing the tool.
    """
    try:
        from tools.browser_cdp_tool import _browser_cdp_check  # type: ignore[import-not-found]
    except Exception as exc:  # pragma: no cover — defensive
        logger.debug("browser_dialog check: browser_cdp_tool import failed: %s", exc)
        return False
    return _browser_cdp_check()
 registry.register(
    name="browser_dialog",
    toolset="browser-cdp",
    schema=BROWSER_DIALOG_SCHEMA,
    handler=lambda args, **kw: browser_dialog(
        action=args.get("action", ""),
        prompt_text=args.get("prompt_text"),
        dialog_id=args.get("dialog_id"),
        task_id=kw.get("task_id"),
    ),
    check_fn=_browser_dialog_check,
    emoji="💬",
 )
--- a/tools/browser_supervisor.py
+++ b/tools/browser_supervisor.py
--- a/tools/browser_tool.py
+++ b/tools/browser_tool.py
@ -63,7 +63,7 @@ import tempfile
 import threading
 import time
 import requests
-from typing import Dict, Any, Optional, List
+from typing import Dict, Any, Optional, List, Tuple
 from pathlib import Path
 from agent.auxiliary_client import call_llm
 from hermes_constants import get_hermes_home
@ -287,6 +287,100 @@ def _get_cdp_override() -> str:
    return ""
 def _get_dialog_policy_config() -> Tuple[str, float]:
    """Read ``browser.dialog_policy`` + ``browser.dialog_timeout_s`` from config.
    Returns a ``(policy, timeout_s)`` tuple, falling back to the supervisor's
    defaults when keys are absent or invalid.
    """
    # Defer imports so browser_tool can be imported in minimal environments.
    from tools.browser_supervisor import (
        DEFAULT_DIALOG_POLICY,
        DEFAULT_DIALOG_TIMEOUT_S,
        _VALID_POLICIES,
    )
    try:
        from hermes_cli.config import read_raw_config
        cfg = read_raw_config()
        browser_cfg = cfg.get("browser", {}) if isinstance(cfg, dict) else {}
        if not isinstance(browser_cfg, dict):
            return DEFAULT_DIALOG_POLICY, DEFAULT_DIALOG_TIMEOUT_S
        policy = str(browser_cfg.get("dialog_policy") or DEFAULT_DIALOG_POLICY)
        if policy not in _VALID_POLICIES:
            logger.debug("Invalid browser.dialog_policy=%r; using default", policy)
            policy = DEFAULT_DIALOG_POLICY
        timeout_raw = browser_cfg.get("dialog_timeout_s")
        try:
            timeout_s = float(timeout_raw) if timeout_raw is not None else DEFAULT_DIALOG_TIMEOUT_S
            if timeout_s <= 0:
                timeout_s = DEFAULT_DIALOG_TIMEOUT_S
        except (TypeError, ValueError):
            timeout_s = DEFAULT_DIALOG_TIMEOUT_S
        return policy, timeout_s
    except Exception:
        return DEFAULT_DIALOG_POLICY, DEFAULT_DIALOG_TIMEOUT_S
 def _ensure_cdp_supervisor(task_id: str) -> None:
    """Start a CDP supervisor for ``task_id`` if an endpoint is reachable.
    Idempotent — delegates to ``SupervisorRegistry.get_or_start`` which skips
    when a supervisor for this ``(task_id, cdp_url)`` already exists and
    tears down + restarts on URL change. Safe to call on every
    ``browser_navigate`` / ``/browser connect`` without worrying about
    double-attach.
    Resolves the CDP URL in this order:
      1. ``BROWSER_CDP_URL`` / ``browser.cdp_url`` — covers ``/browser connect``
         and config-set overrides.
      2. ``_active_sessions[task_id]["cdp_url"]`` — covers Browserbase + any
         other cloud provider whose ``create_session`` returns a raw CDP URL.
    Swallows all errors — failing to attach the supervisor must not break
    the browser session itself.  The agent simply won't see
    ``pending_dialogs`` / ``frame_tree`` fields in snapshots.
    """
    cdp_url = _get_cdp_override()
    if not cdp_url:
        # Fallback: active session may carry a per-session CDP URL from a
        # cloud provider (Browserbase sets this).
        with _cleanup_lock:
            session_info = _active_sessions.get(task_id, {})
        maybe = str(session_info.get("cdp_url") or "")
        if maybe:
            cdp_url = _resolve_cdp_override(maybe)
    if not cdp_url:
        return
    try:
        from tools.browser_supervisor import SUPERVISOR_REGISTRY  # type: ignore[import-not-found]
        policy, timeout_s = _get_dialog_policy_config()
        SUPERVISOR_REGISTRY.get_or_start(
            task_id=task_id,
            cdp_url=cdp_url,
            dialog_policy=policy,
            dialog_timeout_s=timeout_s,
        )
    except Exception as exc:
        logger.debug(
            "CDP supervisor attach for task=%s failed (non-fatal): %s",
            task_id,
            exc,
        )
 def _stop_cdp_supervisor(task_id: str) -> None:
    """Stop the CDP supervisor for ``task_id`` if one exists. No-op otherwise."""
    try:
        from tools.browser_supervisor import SUPERVISOR_REGISTRY  # type: ignore[import-not-found]
        SUPERVISOR_REGISTRY.stop(task_id)
    except Exception as exc:
        logger.debug("CDP supervisor stop for task=%s failed (non-fatal): %s", task_id, exc)
 # ============================================================================
 # Cloud Provider Registry
 # ============================================================================
@ -996,6 +1090,11 @@ def _get_session_info(task_id: Optional[str] = None) -> Dict[str, str]:
            return _active_sessions[task_id]
        _active_sessions[task_id] = session_info
    # Lazy-start the CDP supervisor now that the session exists (if the
    # backend surfaces a CDP URL via override or session_info["cdp_url"]).
    # Idempotent; swallows errors. See _ensure_cdp_supervisor for details.
    _ensure_cdp_supervisor(task_id)
    return session_info
@ -1579,6 +1678,19 @@ def browser_snapshot(
            "element_count": len(refs) if refs else 0
        }
        # Merge supervisor state (pending dialogs + frame tree) when a CDP
        # supervisor is attached to this task. No-op otherwise. See
        # website/docs/developer-guide/browser-supervisor.md.
        try:
            from tools.browser_supervisor import SUPERVISOR_REGISTRY  # type: ignore[import-not-found]
            _supervisor = SUPERVISOR_REGISTRY.get(effective_task_id)
            if _supervisor is not None:
                _sv_snap = _supervisor.snapshot()
                if _sv_snap.active:
                    response.update(_sv_snap.to_dict())
        except Exception as _sv_exc:
            logger.debug("supervisor snapshot merge failed: %s", _sv_exc)
        return json.dumps(response, ensure_ascii=False)
    else:
        return json.dumps({
@ -2249,6 +2361,10 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
    if task_id is None:
        task_id = "default"
    # Stop the CDP supervisor for this task FIRST so we close our WebSocket
    # before the backend tears down the underlying CDP endpoint.
    _stop_cdp_supervisor(task_id)
    # Also clean up Camofox session if running in Camofox mode.
    # Skip full close when managed persistence is enabled — the browser
    # profile (and its session cookies) must survive across agent tasks.
@ -2329,6 +2445,13 @@ def cleanup_all_browsers() -> None:
    for task_id in task_ids:
        cleanup_browser(task_id)
    # Tear down CDP supervisors for all tasks so background threads exit.
    try:
        from tools.browser_supervisor import SUPERVISOR_REGISTRY  # type: ignore[import-not-found]
        SUPERVISOR_REGISTRY.stop_all()
    except Exception:
        pass
    # Reset cached lookups so they are re-evaluated on next use.
    global _cached_agent_browser, _agent_browser_resolved
    global _cached_command_timeout, _command_timeout_resolved
--- a/toolsets.py
+++ b/toolsets.py
@ -43,7 +43,7 @@ _HERMES_CORE_TOOLS = [
    "browser_navigate", "browser_snapshot", "browser_click",
    "browser_type", "browser_scroll", "browser_back",
    "browser_press", "browser_get_images",
-    "browser_vision", "browser_console", "browser_cdp",
+    "browser_vision", "browser_console", "browser_cdp", "browser_dialog",
    # Text-to-speech
    "text_to_speech",
    # Planning & memory
@ -115,7 +115,8 @@ TOOLSETS = {
            "browser_navigate", "browser_snapshot", "browser_click",
            "browser_type", "browser_scroll", "browser_back",
            "browser_press", "browser_get_images",
-            "browser_vision", "browser_console", "browser_cdp", "web_search"
+            "browser_vision", "browser_console", "browser_cdp",
            "browser_dialog", "web_search"
        ],
        "includes": []
    },
@ -249,7 +250,7 @@ TOOLSETS = {
            "browser_navigate", "browser_snapshot", "browser_click",
            "browser_type", "browser_scroll", "browser_back",
            "browser_press", "browser_get_images",
-            "browser_vision", "browser_console", "browser_cdp",
+            "browser_vision", "browser_console", "browser_cdp", "browser_dialog",
            "todo", "memory",
            "session_search",
            "execute_code", "delegate_task",
@ -274,7 +275,7 @@ TOOLSETS = {
            "browser_navigate", "browser_snapshot", "browser_click",
            "browser_type", "browser_scroll", "browser_back",
            "browser_press", "browser_get_images",
-            "browser_vision", "browser_console", "browser_cdp",
+            "browser_vision", "browser_console", "browser_cdp", "browser_dialog",
            # Planning & memory
            "todo", "memory",
            # Session history search
--- a/website/docs/developer-guide/browser-supervisor.md
+++ b/website/docs/developer-guide/browser-supervisor.md
@ -0,0 +1,223 @@
 # Browser CDP Supervisor — Design
 **Status:** Shipped (PR 14540)
 **Last updated:** 2026-04-23
 **Author:** @teknium1
 ## Problem
 Native JS dialogs (`alert`/`confirm`/`prompt`/`beforeunload`) and iframes are
 the two biggest gaps in our browser tooling:
 1. **Dialogs block the JS thread.** Any operation on the page stalls until the
   dialog is handled. Before this work, the agent had no way to know a dialog
   was open — subsequent tool calls would hang or throw opaque errors.
 2. **Iframes are invisible.** The agent could see iframe nodes in the DOM
   snapshot but could not click, type, or eval inside them — especially
   cross-origin (OOPIF) iframes that live in separate Chromium processes.
 [PR #12550](https://github.com/NousResearch/hermes-agent/pull/12550) proposed a
 stateless `browser_dialog` wrapper. That doesn't solve detection — it's a
 cleaner CDP call for when the agent already knows (via symptoms) that a dialog
 is open. Closed as superseded.
 ## Backend capability matrix (verified live 2026-04-23)
 Using throwaway probe scripts against a data-URL page that fires alerts in the
 main frame and in a same-origin srcdoc iframe, plus a cross-origin
 `https://example.com` iframe:
 | Backend | Dialog detect | Dialog respond | Frame tree | OOPIF `Runtime.evaluate` via `browser_cdp(frame_id=...)` |
 |---|---|---|---|---|
 | Local Chrome (`--remote-debugging-port`) / `/browser connect` | ✓ | ✓ full workflow | ✓ | ✓ |
 | Browserbase | ✓ (via bridge) | ✓ full workflow (via bridge) | ✓ | ✓ (`document.title = "Example Domain"` verified on real cross-origin iframe) |
 | Camofox | ✗ no CDP (REST-only) | ✗ | partial via DOM snapshot | ✗ |
 **How Browserbase respond works.** Browserbase's CDP proxy uses Playwright
 internally and auto-dismisses native dialogs within ~10ms, so
 `Page.handleJavaScriptDialog` can't keep up. To work around this, the
 supervisor injects a bridge script via
 `Page.addScriptToEvaluateOnNewDocument` that overrides
 `window.alert`/`confirm`/`prompt` with a synchronous XHR to a magic host
 (`hermes-dialog-bridge.invalid`). `Fetch.enable` intercepts those XHRs
 before they touch the network — the dialog becomes a `Fetch.requestPaused`
 event the supervisor captures, and `respond_to_dialog` fulfills via
 `Fetch.fulfillRequest` with a JSON body the injected script decodes.
 Net result: from the page's perspective, `prompt()` still returns the
 agent-supplied string. From the agent's perspective, it's the same
 `browser_dialog(action=...)` API either way. Tested end-to-end against
 real Browserbase sessions — 4/4 (alert/prompt/confirm-accept/confirm-dismiss)
 pass including value round-tripping back into page JS.
 Camofox stays unsupported for this PR; follow-up upstream issue planned at
 `jo-inc/camofox-browser` requesting a dialog polling endpoint.
 ## Architecture
 ### CDPSupervisor
 One `asyncio.Task` running in a background daemon thread per Hermes `task_id`.
 Holds a persistent WebSocket to the backend's CDP endpoint. Maintains:
 - **Dialog queue** — `List[PendingDialog]` with `{id, type, message, default_prompt, session_id, opened_at}`
 - **Frame tree** — `Dict[frame_id, FrameInfo]` with parent relationships, URL, origin, whether cross-origin child session
 - **Session map** — `Dict[session_id, SessionInfo]` so interaction tools can route to the right attached session for OOPIF operations
 - **Recent console errors** — ring buffer of the last 50 (for PR 2 diagnostics)
 Subscribes on attach:
 - `Page.enable` — `javascriptDialogOpening`, `frameAttached`, `frameNavigated`, `frameDetached`
 - `Runtime.enable` — `executionContextCreated`, `consoleAPICalled`, `exceptionThrown`
 - `Target.setAutoAttach {autoAttach: true, flatten: true}` — surfaces child OOPIF targets; supervisor enables `Page`+`Runtime` on each
 Thread-safe state access via a snapshot lock; tool handlers (sync) read the
 frozen snapshot without awaiting.
 ### Lifecycle
 - **Start:** `SupervisorRegistry.get_or_start(task_id, cdp_url)` — called by
  `browser_navigate`, Browserbase session create, `/browser connect`. Idempotent.
 - **Stop:** session teardown or `/browser disconnect`. Cancels the asyncio
  task, closes the WebSocket, discards state.
 - **Rebind:** if the CDP URL changes (user reconnects to a new Chrome), stop
  the old supervisor and start fresh — never reuse state across endpoints.
 ### Dialog policy
 Configurable via `config.yaml` under `browser.dialog_policy`:
 - **`must_respond`** (default) — capture, surface in `browser_snapshot`, wait
  for explicit `browser_dialog(action=...)` call. After a 300s safety timeout
  with no response, auto-dismiss and log. Prevents a buggy agent from stalling
  forever.
 - `auto_dismiss` — record and dismiss immediately; agent sees it after the
  fact via `browser_state` inside `browser_snapshot`.
 - `auto_accept` — record and accept (useful for `beforeunload` where the user
  wants to navigate away cleanly).
 Policy is per-task; no per-dialog overrides in v1.
 ## Agent surface (PR 1)
 ### One new tool
 ```
 browser_dialog(action, prompt_text=None, dialog_id=None)
 ```
 - `action="accept"` / `"dismiss"` → responds to the specified or sole pending dialog (required)
 - `prompt_text=...` → text to supply to a `prompt()` dialog
 - `dialog_id=...` → disambiguate when multiple dialogs queued (rare)
 Tool is response-only. Agent reads pending dialogs from `browser_snapshot`
 output before calling.
 ### `browser_snapshot` extension
 Adds three optional fields to the existing snapshot output when a supervisor
 is attached:
 ```json
 {
  "pending_dialogs": [
    {"id": "d-1", "type": "alert", "message": "Hello", "opened_at": 1650000000.0}
  ],
  "recent_dialogs": [
    {"id": "d-1", "type": "alert", "message": "...", "opened_at": 1650000000.0,
     "closed_at": 1650000000.1, "closed_by": "remote"}
  ],
  "frame_tree": {
    "top": {"frame_id": "FRAME_A", "url": "https://example.com/", "origin": "https://example.com"},
    "children": [
      {"frame_id": "FRAME_B", "url": "about:srcdoc", "is_oopif": false},
      {"frame_id": "FRAME_C", "url": "https://ads.example.net/", "is_oopif": true, "session_id": "SID_C"}
    ],
    "truncated": false
  }
 }
 ```
 - **`pending_dialogs`**: dialogs currently blocking the page's JS thread.
  The agent must call `browser_dialog(action=...)` to respond. Empty on
  Browserbase because their CDP proxy auto-dismisses within ~10ms.
 - **`recent_dialogs`**: ring buffer of up to 20 recently-closed dialogs with
  a `closed_by` tag — `"agent"` (we responded), `"auto_policy"` (local
  auto_dismiss/auto_accept), `"watchdog"` (must_respond timeout hit), or
  `"remote"` (browser/backend closed it on us, e.g. Browserbase). This is
  how agents on Browserbase still get visibility into what happened.
 - **`frame_tree`**: frame structure including cross-origin (OOPIF) children.
  Capped at 30 entries + OOPIF depth 2 to bound snapshot size on ad-heavy
  pages. `truncated: true` surfaces when limits were hit; agents needing
  the full tree can use `browser_cdp` with `Page.getFrameTree`.
 No new tool schema surface for any of these — the agent reads the snapshot
 it already requests.
 ### Availability gating
 Both surfaces gate on `_browser_cdp_check` (supervisor can only run when a CDP
 endpoint is reachable). On Camofox / no-backend sessions, the dialog tool is
 hidden and snapshot omits the new fields — no schema bloat.
 ## Cross-origin iframe interaction
 Extending the dialog-detect work, `browser_cdp(frame_id=...)` routes CDP
 calls (notably `Runtime.evaluate`) through the supervisor's already-connected
 WebSocket using the OOPIF's child `sessionId`. Agents pick frame_ids out of
 `browser_snapshot.frame_tree.children[]` where `is_oopif=true` and pass them
 to `browser_cdp`. For same-origin iframes (no dedicated CDP session), the
 agent uses `contentWindow`/`contentDocument` from a top-level
 `Runtime.evaluate` instead — supervisor surfaces an error pointing at that
 fallback when `frame_id` belongs to a non-OOPIF.
 On Browserbase, this is the ONLY reliable path for iframe interaction —
 stateless CDP connections (opened per `browser_cdp` call) hit signed-URL
 expiry, while the supervisor's long-lived connection keeps a valid session.
 ## Camofox (follow-up)
 Issue planned against `jo-inc/camofox-browser` adding:
 - Playwright `page.on('dialog', handler)` per session
 - `GET /tabs/:tabId/dialogs` polling endpoint
 - `POST /tabs/:tabId/dialogs/:id` to accept/dismiss
 - Frame-tree introspection endpoint
 ## Files touched (PR 1)
 ### New
 - `tools/browser_supervisor.py` — `CDPSupervisor`, `SupervisorRegistry`, `PendingDialog`, `FrameInfo`
 - `tools/browser_dialog_tool.py` — `browser_dialog` tool handler
 - `tests/tools/test_browser_supervisor.py` — mock CDP WebSocket server + lifecycle/state tests
 - `website/docs/developer-guide/browser-supervisor.md` — this file
 ### Modified
 - `toolsets.py` — register `browser_dialog` in `browser`, `hermes-acp`, `hermes-api-server`, core toolsets (gated on CDP reachability)
 - `tools/browser_tool.py`
  - `browser_navigate` start-hook: if CDP URL resolvable, `SupervisorRegistry.get_or_start(task_id, cdp_url)`
  - `browser_snapshot` (at ~line 1536): merge supervisor state into return payload
  - `/browser connect` handler: restart supervisor with new endpoint
  - Session teardown hooks in `_cleanup_browser_session`
 - `hermes_cli/config.py` — add `browser.dialog_policy` and `browser.dialog_timeout_s` to `DEFAULT_CONFIG`
 - Docs: `website/docs/user-guide/features/browser.md`, `website/docs/reference/tools-reference.md`, `website/docs/reference/toolsets-reference.md`
 ## Non-goals
 - Detection/interaction for Camofox (upstream gap; tracked separately)
 - Streaming dialog/frame events live to the user (would require gateway hooks)
 - Persisting dialog history across sessions (in-memory only)
 - Per-iframe dialog policies (agent can express this via `dialog_id`)
 - Replacing `browser_cdp` — it stays as the escape hatch for the long tail (cookies, viewport, network throttling)
 ## Testing
 Unit tests use an asyncio mock CDP server that speaks enough of the protocol
 to exercise all state transitions: attach, enable, navigate, dialog fire,
 dialog dismiss, frame attach/detach, child target attach, session teardown.
 Real-backend E2E (Browserbase + local Chrome) is manual; probe scripts from
 the 2026-04-23 investigation kept in-repo under
 `scripts/browser_supervisor_e2e.py` so anyone can re-verify on new backend
 versions.
--- a/website/docs/reference/tools-reference.md
+++ b/website/docs/reference/tools-reference.md
@ -6,9 +6,9 @@ description: "Authoritative reference for Hermes built-in tools, grouped by tool
 # Built-in Tools Reference
-This page documents all 53 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets.
+This page documents all 55 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets.
-**Quick counts:** 11 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, 5 Feishu tools, and 15 standalone tools across other toolsets.
+**Quick counts:** 12 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, 5 Feishu tools, and 15 standalone tools across other toolsets.
 :::tip MCP Tools
 In addition to built-in tools, Hermes can load tools dynamically from MCP servers. MCP tools appear with a server-name prefix (e.g., `github_create_issue` for the `github` MCP server). See [MCP Integration](/docs/user-guide/features/mcp) for configuration.
@ -20,6 +20,7 @@ In addition to built-in tools, Hermes can load tools dynamically from MCP server
 |------|-------------|----------------------|
 | `browser_back` | Navigate back to the previous page in browser history. Requires browser_navigate to be called first. | — |
 | `browser_cdp` | Send a raw Chrome DevTools Protocol (CDP) command. Escape hatch for browser operations not covered by browser_navigate, browser_click, browser_console, etc. Only available when a CDP endpoint is reachable at session start — via `/browser connect` or `browser.cdp_url` config. See https://chromedevtools.github.io/devtools-protocol/ | — |
 | `browser_dialog` | Respond to a native JavaScript dialog (alert / confirm / prompt / beforeunload). Call `browser_snapshot` first — pending dialogs appear in its `pending_dialogs` field. Then call `browser_dialog(action='accept'|'dismiss')`. Same availability as `browser_cdp` (Browserbase or `/browser connect`). | — |
 | `browser_click` | Click on an element identified by its ref ID from the snapshot (e.g., '@e5'). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first. | — |
 | `browser_console` | Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requi… | — |
 | `browser_get_images` | Get a list of all images on the current page with their URLs and alt text. Useful for finding images to analyze with the vision tool. Requires browser_navigate to be called first. | — |
--- a/website/docs/reference/toolsets-reference.md
+++ b/website/docs/reference/toolsets-reference.md
@ -52,7 +52,7 @@ Or in-session:
 | Toolset | Tools | Purpose |
 |---------|-------|---------|
-| `browser` | `browser_back`, `browser_cdp`, `browser_click`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `web_search` | Full browser automation. Includes `web_search` as a fallback for quick lookups. `browser_cdp` is a raw CDP passthrough gated on a reachable CDP endpoint — it only appears when `/browser connect` is active or `browser.cdp_url` is set. |
+| `browser` | `browser_back`, `browser_cdp`, `browser_click`, `browser_console`, `browser_dialog`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `web_search` | Full browser automation. Includes `web_search` as a fallback for quick lookups. `browser_cdp` and `browser_dialog` are gated on a reachable CDP endpoint — they only appear when `/browser connect` is active, `browser.cdp_url` is set, or a Browserbase session is active. `browser_dialog` works together with the `pending_dialogs` and `frame_tree` fields that `browser_snapshot` adds when a CDP supervisor is attached. |
 | `clarify` | `clarify` | Ask the user a question when the agent needs clarification. |
 | `code_execution` | `execute_code` | Run Python scripts that call Hermes tools programmatically. |
 | `cronjob` | `cronjob` | Schedule and manage recurring tasks. |
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@ -1240,10 +1240,26 @@ browser:
  inactivity_timeout: 120        # Seconds before auto-closing idle sessions
  command_timeout: 30             # Timeout in seconds for browser commands (screenshot, navigate, etc.)
  record_sessions: false         # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
  # Optional CDP override — when set, Hermes attaches directly to your own
  # Chrome (via /browser connect) rather than starting a headless browser.
  cdp_url: ""
  # Dialog supervisor — controls how native JS dialogs (alert / confirm / prompt)
  # are handled when a CDP backend is attached (Browserbase, local Chrome via
  # /browser connect). Ignored on Camofox and default local agent-browser mode.
  dialog_policy: must_respond    # must_respond | auto_dismiss | auto_accept
  dialog_timeout_s: 300          # Safety auto-dismiss under must_respond (seconds)
  camofox:
    managed_persistence: false   # When true, Camofox sessions persist cookies/logins across restarts
 ```
 **Dialog policies:**
 - `must_respond` (default) — capture the dialog, surface it in `browser_snapshot.pending_dialogs`, and wait for the agent to call `browser_dialog(action=...)`. After `dialog_timeout_s` seconds with no response, the dialog is auto-dismissed to prevent the page's JS thread from stalling forever.
 - `auto_dismiss` — capture, dismiss immediately. The agent still sees the dialog record in `browser_snapshot.recent_dialogs` with `closed_by="auto_policy"` after the fact.
 - `auto_accept` — capture, accept immediately. Useful for pages with aggressive `beforeunload` prompts.
 See the [browser feature page](./features/browser.md#browser_dialog) for the full dialog workflow.
 The browser toolset supports multiple providers. See the [Browser feature page](/docs/user-guide/features/browser) for details on Browserbase, Browser Use, and local Chrome CDP setup.
 ## Timezone
--- a/website/docs/user-guide/features/browser.md
+++ b/website/docs/user-guide/features/browser.md
@ -355,7 +355,50 @@ browser_cdp(method="Runtime.evaluate",
 browser_cdp(method="Network.getAllCookies")
 ```
-Browser-level methods (`Target.*`, `Browser.*`, `Storage.*`) omit `target_id`. Page-level methods (`Page.*`, `Runtime.*`, `DOM.*`, `Emulation.*`) require a `target_id` from `Target.getTargets`. Each call is independent — sessions do not persist between calls.
+Browser-level methods (`Target.*`, `Browser.*`, `Storage.*`) omit `target_id`. Page-level methods (`Page.*`, `Runtime.*`, `DOM.*`, `Emulation.*`) require a `target_id` from `Target.getTargets`. Each stateless call is independent — sessions do not persist between calls.
 **Cross-origin iframes:** pass `frame_id` (from `browser_snapshot.frame_tree.children[]` where `is_oopif=true`) to route the CDP call through the supervisor's live session for that iframe. This is how `Runtime.evaluate` inside a cross-origin iframe works on Browserbase, where stateless CDP connections would hit signed-URL expiry. Example:
 ```
 browser_cdp(
  method="Runtime.evaluate",
  params={"expression": "document.title", "returnByValue": True},
  frame_id="<frame_id from browser_snapshot>",
 )
 ```
 Same-origin iframes don't need `frame_id` — use `document.querySelector('iframe').contentDocument` from a top-level `Runtime.evaluate` instead.
 ### `browser_dialog`
 Responds to a native JS dialog (`alert` / `confirm` / `prompt` / `beforeunload`). Before this tool existed, dialogs would silently block the page's JavaScript thread and subsequent `browser_*` calls would hang or throw; now the agent sees pending dialogs in `browser_snapshot` output and responds explicitly.
 **Workflow:**
 1. Call `browser_snapshot`. If a dialog is blocking the page, it shows up as `pending_dialogs: [{"id": "d-1", "type": "alert", "message": "..."}]`.
 2. Call `browser_dialog(action="accept")` or `browser_dialog(action="dismiss")`. For `prompt()` dialogs, pass `prompt_text="..."` to supply the response.
 3. Re-snapshot — `pending_dialogs` is empty; the page's JS thread has resumed.
 **Detection happens automatically** via a persistent CDP supervisor — one WebSocket per task that subscribes to Page/Runtime/Target events. The supervisor also populates a `frame_tree` field in the snapshot so the agent can see the iframe structure of the current page, including cross-origin (OOPIF) iframes.
 **Availability matrix:**
 | Backend | Detection via `pending_dialogs` | Response (`browser_dialog` tool) |
 |---|---|---|
 | Local Chrome via `/browser connect` or `browser.cdp_url` | ✓ | ✓ full workflow |
 | Browserbase | ✓ | ✓ full workflow (via injected XHR bridge) |
 | Camofox / default local agent-browser | ✗ | ✗ (no CDP endpoint) |
 **How it works on Browserbase.** Browserbase's CDP proxy auto-dismisses real native dialogs server-side within ~10ms, so we can't use `Page.handleJavaScriptDialog`. The supervisor injects a small script via `Page.addScriptToEvaluateOnNewDocument` that overrides `window.alert`/`confirm`/`prompt` with a synchronous XHR. We intercept those XHRs via `Fetch.enable` — the page's JS thread stays blocked on the XHR until we call `Fetch.fulfillRequest` with the agent's response. `prompt()` return values round-trip back into page JS unchanged.
 **Dialog policy** is configured in `config.yaml` under `browser.dialog_policy`:
 | Policy | Behavior |
 |--------|----------|
 | `must_respond` (default) | Capture, surface in snapshot, wait for explicit `browser_dialog()` call. Safety auto-dismiss after `browser.dialog_timeout_s` (default 300s) so a buggy agent can't stall forever. |
 | `auto_dismiss` | Capture, dismiss immediately. Agent still sees the dialog in `browser_state` history but doesn't have to act. |
 | `auto_accept` | Capture, accept immediately. Useful when navigating pages with aggressive `beforeunload` prompts. |
 **Frame tree** inside `browser_snapshot.frame_tree` is capped to 30 frames and OOPIF depth 2 to keep payloads bounded on ad-heavy pages. A `truncated: true` flag surfaces when limits were hit; agents needing the full tree can use `browser_cdp` with `Page.getFrameTree`.
 ## Practical Examples