hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-09 08:21:50 +00:00

History

Teknium 5a1c599412 feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540 ) * docs: browser CDP supervisor design (for upcoming PR) Design doc ahead of implementation — dialog + iframe detection/interaction via a persistent CDP supervisor. Covers backend capability matrix (verified live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split, non-goals, and test plan. Supersedes #12550. No code changes in this commit. * feat(browser): add persistent CDP supervisor for dialog + frame detection Single persistent CDP WebSocket per Hermes task_id that subscribes to Page/Runtime/Target events and maintains thread-safe state for pending dialogs, frame tree, and console errors. Supervisor lives in its own daemon thread running an asyncio loop; external callers use sync API (snapshot(), respond_to_dialog()) that bridges onto the loop. Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true} and enables Page+Runtime on each so iframe-origin dialogs surface through the same supervisor. Dialog policies: must_respond (default, 300s safety timeout), auto_dismiss, auto_accept. Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot payloads bounded on ad-heavy pages. E2E verified against real Chrome via smoke test — detects + responds to main-frame alerts, iframe-contentWindow alerts, preserves frame tree, graceful no-dialog error path, clean shutdown. No agent-facing tool wiring in this commit (comes next). * feat(browser): add browser_dialog tool wired to CDP supervisor Agent-facing response-only tool. Schema: action: 'accept' \| 'dismiss' (required) prompt_text: response for prompt() dialogs (optional) dialog_id: disambiguate when multiple dialogs queued (optional) Handler: SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...) check_fn shares _browser_cdp_check with browser_cdp so both surface and hide together. When no supervisor is attached (Camofox, default Playwright, or no browser session started yet), tool is hidden; if somehow invoked it returns a clear error pointing the agent to browser_navigate / /browser connect. Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp / hermes-api-server toolsets alongside browser_cdp. * feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot Supervisor lifecycle: * _get_session_info lazy-starts the supervisor after a session row is materialized — covers every backend code path (Browserbase, cdp_url override, /browser connect, future providers) with one hook. * cleanup_browser(task_id) stops the supervisor for that task first (before the backend tears down CDP). * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all(). * /browser connect eagerly starts the supervisor for task 'default' so the first snapshot already shows pending_dialogs. * /browser disconnect stops the supervisor. CDP URL resolution for the supervisor: 1. BROWSER_CDP_URL / browser.cdp_url override. 2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase). browser_snapshot merges supervisor state (pending_dialogs + frame_tree) into its JSON output when a supervisor is active — the agent reads pending_dialogs from the snapshot it already requests, then calls browser_dialog to respond. No extra tool surface. Config defaults: * browser.dialog_policy: 'must_respond' (new) * browser.dialog_timeout_s: 300 (new) No version bump — new keys deep-merge into existing browser section. Deadlock fix in supervisor event dispatch: * _on_dialog_opening and _on_target_attached used to await CDP calls while the reader was still processing an event — but only the reader can set the response Future, so the call timed out. * Both now fire asyncio.create_task(...) so the reader stays pumping. * auto_dismiss/auto_accept now actually close the dialog immediately. Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome): * supervisor start/snapshot * main-frame alert detection + dismiss * iframe.contentWindow alert * prompt() with prompt_text reply * respond with no pending dialog -> clean error * auto_dismiss clears on event * registry idempotency * registry stop -> snapshot reports inactive * browser_dialog tool no-supervisor error * browser_dialog invalid action * browser_dialog end-to-end via tool handler xdist-safe: chrome_cdp fixture uses a per-worker port. Skipped when google-chrome/chromium isn't installed. * docs(browser): document browser_dialog tool + CDP supervisor - user-guide/features/browser.md: new browser_dialog section with workflow, availability gate, and dialog_policy table - reference/tools-reference.md: row for browser_dialog, tool count bumped 53 -> 54, browser tools count 11 -> 12 - reference/toolsets-reference.md: browser_dialog added to browser toolset row with note on pending_dialogs / frame_tree snapshot fields Full design doc lives at developer-guide/browser-supervisor.md (committed earlier). * fix(browser): reconnect loop + recent_dialogs for Browserbase visibility Found via Browserbase E2E test that revealed two production-critical issues: 1. Supervisor WebSocket drops when other clients disconnect. Browserbase's CDP proxy tears down our long-lived WebSocket whenever a short-lived client (e.g. agent-browser CLI's per-command CDP connection) disconnects. Fixed with a reconnecting _run loop that re-attaches with exponential backoff on drops. _page_session_id and _child_sessions are reset on each reconnect; pending_dialogs and frames are preserved across reconnects. 2. Browserbase auto-dismisses dialogs server-side within ~10ms. Their Playwright-based CDP proxy dismisses alert/confirm/prompt before our Page.handleJavaScriptDialog call can respond. So pending_dialogs is empty by the time the agent reads a snapshot on Browserbase. Added a recent_dialogs ring buffer (capacity 20) that retains a DialogRecord for every dialog that opened, with a closed_by tag: * 'agent' — agent called browser_dialog * 'auto_policy' — local auto_dismiss/auto_accept fired * 'watchdog' — must_respond timeout auto-dismissed (300s default) * 'remote' — browser/backend closed it on us (Browserbase) Agents on Browserbase now see the dialog history with closed_by='remote' so they at least know a dialog fired, even though they couldn't respond. 3. Page.javascriptDialogClosed matching bug. The event doesn't include a 'message' field (CDP spec has only 'result' and 'userInput') but our _on_dialog_closed was matching on message. Fixed to match by session_id + oldest-first, with a safety assumption that only one dialog is in flight per session (the JS thread is blocked while a dialog is up). Docs + tests updated: * browser.md: new availability matrix showing the three backends and which mode (pending / recent / response) each supports * developer-guide/browser-supervisor.md: three-field snapshot schema with closed_by semantics * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12 passing against real Chrome) E2E verified both backends: * Local Chrome via /browser connect: detect + respond full workflow (smoke_supervisor.py all 7 scenarios pass) * Browserbase: detect via recent_dialogs with closed_by='remote' (smoke_supervisor_browserbase_v2.py passes) Camofox remains out of scope (REST-only, no CDP) — tracked for upstream PR 3. * feat(browser): XHR bridge for dialog response on Browserbase (FIXED) Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so Page.handleJavaScriptDialog calls lose the race. Solution: bypass native dialogs entirely. The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a JavaScript override for window.alert/confirm/prompt. Those overrides perform a synchronous XMLHttpRequest to a magic host ('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable with a requestStage=Request pattern. Flow when a page calls alert('hi'): 1. window.alert override intercepts, builds XHR GET to http://hermes-dialog-bridge.invalid/?kind=alert&message=hi 2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics) 3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces it as a pending dialog with bridge_request_id set 4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog 5. Supervisor calls Fetch.fulfillRequest with JSON body: {accept: true\|false, prompt_text: '...', dialog_id: 'd-N'} 6. The injected script parses the body, returns the appropriate value from the override (undefined for alert, bool for confirm, string\|null for prompt) This works identically on Browserbase AND local Chrome — no native dialog ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog policies (must_respond / auto_dismiss / auto_accept) all still work. Bridge is installed on every attached session (main page + OOPIF child sessions) so iframe dialogs are captured too. Native-dialog path kept as a fallback for backends that don't auto-dismiss (so a page that somehow bypasses our override — e.g. iframes that load after Fetch.enable but before the init-script runs — still gets observed via Page.javascriptDialogOpening). E2E VERIFIED: * Local Chrome: 13/13 pytest tests green (12 original + new test_bridge_captures_prompt_and_returns_reply_text that asserts window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds) * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS: - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓ - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY' → page.prompt_ret === 'AGENT-REPLY' ✓ - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓ - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓ Docs updated in browser.md and developer-guide/browser-supervisor.md — availability matrix now shows Browserbase at full parity with local Chrome for both detection and response. * feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...) Adds iframe interaction to the CDP supervisor PR (was queued as PR 2). Design: browser_cdp gets an optional frame_id parameter. When set, the tool looks up the frame in the supervisor's frame_tree, grabs its child cdp_session_id (OOPIF session), and dispatches the CDP call through the supervisor's already-connected WebSocket via run_coroutine_threadsafe. Why not stateless: on Browserbase, each fresh browser_cdp WebSocket must re-negotiate against a signed connectUrl. The session info carries a specific URL that can expire while the supervisor's long-lived connection stays valid. Routing via the supervisor sidesteps this. Agent workflow: 1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true 2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>, params={'expression': 'document.title', 'returnByValue': True}) 3. Supervisor dispatches the call on the OOPIF's child session Supervisor state fixes needed along the way: * _on_frame_detached now skips reason='swap' (frame migrating processes) * _on_frame_detached also skips when the frame is an OOPIF with a live child session — Browserbase fires spurious remove events when a same-origin iframe gets promoted to OOPIF * _on_target_detached clears cdp_session_id but KEEPS the frame record so the agent still sees the OOPIF in frame_tree during transient session flaps E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py): browser_cdp(method='Runtime.evaluate', params={'expression': 'document.title', 'returnByValue': True}, frame_id=<OOPIF>) → {'success': True, 'result': {'value': 'Example Domain'}} The iframe is <iframe src='https://example.com/'> inside a top-level data: URL page on a real Browserbase session. The agent Runtime.evaluates INSIDE the cross-origin iframe and gets example.com's title back. Tests (tests/tools/test_browser_supervisor.py — 16 pass total): * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF, verifies routing via supervisor, Runtime.evaluate returns 1+1=2 * test_browser_cdp_frame_id_missing_supervisor — clean error when no supervisor attached * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad frame_id Docs (browser.md and developer-guide/browser-supervisor.md) updated with the iframe workflow, availability matrix now shows OOPIF eval as shipped for local Chrome + Browserbase. * test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process When asked 'did you test the iframe stuff' I had only done a mocked pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/ smoke_local_oopif.py: * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906) * Chrome with --site-per-process so the cross-origin iframe becomes a real OOPIF in its own process * Navigate, find OOPIF in supervisor.frame_tree, call browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes through the supervisor's child session * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the inner page, retrieved via OOPIF eval) PASSED on 2026-04-23. Tried to embed this as a pytest but hit an asyncio version quirk between venv (3.11) and the system python (3.13) — Page.navigate hangs in the pytest harness but works in standalone. Left a self-documenting skip test that points to the smoke script + describes the verification. chrome_cdp fixture now passes --site-per-process so future iframe tests can rely on OOPIF behavior. Result: 16 pass + 1 documented-skip = 17 tests in tests/tools/test_browser_supervisor.py. * docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count Pre-merge docs audit revealed two gaps: 1. user-guide/configuration.md browser config example was missing the two new dialog_* knobs. Added with a short table explaining must_respond / auto_dismiss / auto_accept semantics and a link to the feature page for the full workflow. 2. reference/tools-reference.md header said '54 built-in tools' — real count on main is 54, this branch adds browser_dialog so it's 55. Fixed the header. (browser count was already correctly bumped 11 -> 12 in the earlier docs commit.) No code changes.		2026-04-23 22:23:37 -07:00
..
__init__.py	test: reorganize test structure and add missing unit tests	2026-02-26 03:20:08 +03:00
test_accretion_caps.py	fix(tools): bound _read_tracker sub-containers + prune _completion_consumed (#11839 )	2026-04-17 15:53:57 -07:00
test_ansi_strip.py	fix: strip ANSI at the source — clean terminal output before it reaches the model	2026-03-23 07:43:12 -07:00
test_approval.py	test: cover absolute paths in project env/config approval regex	2026-04-23 14:05:36 -07:00
test_approval_heartbeat.py	fix(approval): heartbeat activity during gateway approval wait (#11245 )	2026-04-16 14:48:50 -07:00
test_base_environment.py	feat(environments): unified spawn-per-call execution layer	2026-04-08 17:23:15 -07:00
test_browser_camofox.py	fix(camofox): honor auxiliary vision temperature\n\n- forward auxiliary.vision.temperature in camofox screenshot analysis\n- add regression tests for configured and default behavior	2026-04-20 00:32:09 -07:00
test_browser_camofox_persistence.py	docs: remove nonexistent CAMOFOX_PROFILE_DIR env var references (#10976 )	2026-04-16 04:07:11 -07:00
test_browser_camofox_state.py	test: stop testing mutable data — convert change-detectors to invariants (#13363 )	2026-04-20 23:20:33 -07:00
test_browser_cdp_override.py	Support browser CDP URL from config	2026-04-17 16:05:04 -07:00
test_browser_cdp_tool.py	feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369 )	2026-04-19 00:03:10 -07:00
test_browser_cleanup.py	fix(doctor): only check the active memory provider, not all providers unconditionally (#6285 )	2026-04-08 13:44:58 -07:00
test_browser_cloud_fallback.py	fix(browser): runtime fallback to local Chromium when cloud provider fails	2026-04-16 04:19:34 -07:00
test_browser_console.py	fix(browser): honor auxiliary.vision.temperature for screenshot analysis\n\n- mirror the vision tool's config bridge in browser_vision	2026-04-20 00:32:09 -07:00
test_browser_content_none_guard.py	fix(browser): guard LLM response content against None in snapshot and vision (#3642 )	2026-03-28 17:25:04 -07:00
test_browser_hardening.py	fix(browser): hardening — dead code, caching, scroll perf, security, thread safety	2026-04-10 13:05:44 -07:00
test_browser_homebrew_paths.py	fix(browser): add termux PATH fallbacks	2026-04-14 16:55:55 -07:00
test_browser_orphan_reaper.py	fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843 )	2026-04-17 18:46:30 -07:00
test_browser_secret_exfil.py	fix: rewrite test mock secrets and add redaction fixture	2026-04-01 12:03:56 -07:00
test_browser_ssrf_local.py	fix(browser): skip SSRF check for local backends (Camofox, headless Chromium) (#4292 )	2026-03-31 10:40:13 -07:00
test_browser_supervisor.py	feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540 )	2026-04-23 22:23:37 -07:00
test_budget_config.py	test(tools): add unit tests for budget_config module	2026-04-11 02:58:48 -07:00
test_checkpoint_manager.py	fix(checkpoints): isolate shadow git repo from user's global config (#11261 )	2026-04-16 16:06:49 -07:00
test_clarify_tool.py	test(tools): add unit tests for clarify_tool.py	2026-02-27 03:29:26 -05:00
test_clipboard.py	feat: fix img pasting in new ink plus newline after tools	2026-04-11 13:14:32 -05:00
test_code_execution.py	fix: follow-up for salvaged PR #10854	2026-04-16 06:42:45 -07:00
test_code_execution_modes.py	feat(execute_code): add project/strict execution modes, default to project (#11971 )	2026-04-18 01:46:25 -07:00
test_command_guards.py	fix: remove 115 verified dead code symbols across 46 production files	2026-04-10 03:44:43 -07:00
test_config_null_guard.py	fix: guard config.get() against YAML null values to prevent AttributeError (#3377 )	2026-03-27 04:03:00 -07:00
test_credential_files.py	fix: remove 115 verified dead code symbols across 46 production files	2026-04-10 03:44:43 -07:00
test_cron_approval_mode.py	feat: configurable approval mode for cron jobs (approvals.cron_mode)	2026-04-18 19:24:35 -07:00
test_cron_prompt_injection.py	fix: cron prompt injection scanner bypass for multi-word variants	2026-02-26 13:55:54 +03:00
test_cronjob_tools.py	feat(skills): consolidate find-nearby into maps as a single location skill	2026-04-19 05:19:22 -07:00
test_daytona_environment.py	fix: update tests for unified spawn-per-call execution model	2026-04-08 17:23:15 -07:00
test_debug_helpers.py	fix(tests): isolate HERMES_HOME in tests and adjust log directory for debug session	2026-03-02 04:34:21 -08:00
test_delegate.py	fix(delegate): remove model-facing max_iterations override; config is authoritative (#14732 )	2026-04-23 13:56:26 -07:00
test_delegate_toolset_scope.py	fix(security): restrict subagent toolsets to parent's enabled set (#3269 )	2026-03-26 14:50:26 -07:00
test_discord_tool.py	fix(ci): unblock test suite + cut ~2s of dead Z.AI probes from every AIAgent	2026-04-19 19:18:19 -07:00
test_docker_environment.py	fix(docker): add SETUID/SETGID caps so gosu drop in entrypoint succeeds	2026-04-22 18:13:14 -07:00
test_docker_find.py	feat: entry-level Podman support — find_docker() + rootless entrypoint (#10066 )	2026-04-14 21:20:37 -07:00
test_env_passthrough.py	fix(env_passthrough): reject Hermes provider credentials from skill passthrough (#13523 )	2026-04-21 06:14:25 -07:00
test_feishu_tools.py	feat: add Feishu document comment intelligent reply with 3-tier access control	2026-04-17 19:04:11 -07:00
test_file_operations.py	tools: normalize file tool pagination bounds	2026-04-22 06:11:41 -07:00
test_file_operations_edge_cases.py	tools: normalize file tool pagination bounds	2026-04-22 06:11:41 -07:00
test_file_ops_cwd_tracking.py	fix(file-ops): follow terminal env's live cwd in _exec instead of init-time cached cwd (#11912 )	2026-04-17 19:26:40 -07:00
test_file_read_guards.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
test_file_staleness.py	fix(file_tools): resolve bookkeeping paths against live terminal cwd	2026-04-23 15:11:52 -07:00
test_file_state_registry.py	feat(delegate): cross-agent file state coordination for concurrent subagents (#13718 )	2026-04-21 16:41:26 -07:00
test_file_sync.py	test(file_sync): add tests for bulk_upload_fn callback	2026-04-10 21:14:32 -07:00
test_file_sync_back.py	fix: harden sync_back — PID-suffix temp path, size cap, lifecycle guards	2026-04-16 19:39:21 -07:00
test_file_sync_perf.py	test: add reproducible perf benchmark for file sync overhead	2026-04-10 03:01:46 -07:00
test_file_tools.py	tools: normalize file tool pagination bounds	2026-04-22 06:11:41 -07:00
test_file_tools_container_config.py	fix(docker): pass docker_mount_cwd_to_workspace and docker_forward_env to container_config in file_tools	2026-04-20 00:58:16 -07:00
test_file_tools_live.py	feat(environments): unified spawn-per-call execution layer	2026-04-08 17:23:15 -07:00
test_file_write_safety.py	fix(file_tools): block /private/etc writes on macOS symlink bypass	2026-04-13 05:15:05 -07:00
test_force_dangerous_override.py	fix(skills): honor policy table for dangerous verdicts	2026-03-14 11:27:02 -07:00
test_fuzzy_match.py	fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage	2026-04-21 02:03:46 -07:00
test_hidden_dir_filter.py	fix: use Path.parts for hidden directory filter in skill listing	2026-03-04 18:34:16 +03:00
test_homeassistant_tool.py	fix: clean up description escaping, add string-data tests	2026-04-13 04:45:07 -07:00
test_image_generation.py	feat(image-gen): add GPT Image 2 to FAL catalog (#13677 )	2026-04-21 13:35:31 -07:00
test_image_generation_env.py	Normalize FAL_KEY env handling (ignore whitespace-only values)	2026-04-21 02:04:21 -07:00
test_image_generation_plugin_dispatch.py	fix(image-gen): force-refresh plugin providers in long-lived sessions	2026-04-23 03:01:18 -07:00
test_interrupt.py	fix: resolve remaining 4 CI test failures (#9543 )	2026-04-14 02:18:38 -07:00
test_llm_content_none_guard.py	fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389 ) (#3449 )	2026-03-27 15:28:19 -07:00
test_local_background_child_hang.py	fix(environments): use incremental UTF-8 decoder in select-based drain	2026-04-19 11:27:50 -07:00
test_local_env_blocklist.py	fix(providers): complete NVIDIA NIM parity with other providers	2026-04-17 13:47:46 -07:00
test_local_interrupt_cleanup.py	fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907 )	2026-04-17 20:39:25 -07:00
test_local_shell_init.py	fix(terminal): auto-source ~/.profile and ~/.bash_profile so n/nvm PATH survives (#14534 )	2026-04-23 05:15:37 -07:00
test_local_tempdir.py	fix(termux): honor temp dirs for local temp artifacts	2026-04-09 16:24:53 -07:00
test_managed_browserbase_and_modal.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
test_managed_media_gateways.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
test_managed_modal_environment.py	fix: add activity heartbeats to prevent false gateway inactivity timeouts (#10501 )	2026-04-15 13:29:05 -07:00
test_managed_server_tool_support.py	fix(tests): fix several failing/flaky tests on main (#6777 )	2026-04-09 13:17:06 -07:00
test_managed_tool_gateway.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
test_mcp_circuit_breaker.py	test(mcp): add failing tests for circuit-breaker recovery	2026-04-21 05:19:03 -07:00
test_mcp_dynamic_discovery.py	fix(mcp): make server aliases explicit	2026-04-14 17:19:20 -07:00
test_mcp_oauth.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
test_mcp_oauth_bidirectional.py	fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025 ) (#12717 )	2026-04-19 16:31:07 -07:00
test_mcp_oauth_cold_load_expiry.py	fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025 ) (#12717 )	2026-04-19 16:31:07 -07:00
test_mcp_oauth_integration.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
test_mcp_oauth_manager.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
test_mcp_probe.py	fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness	2026-04-04 10:18:57 -07:00
test_mcp_reconnect_signal.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
test_mcp_stability.py	fix(mcp): per-process PID isolation prevents cross-session crash on restart	2026-04-23 15:11:47 -07:00
test_mcp_structured_content.py	fix(mcp): combine content and structuredContent when both present (#7118 )	2026-04-10 03:44:35 -07:00
test_mcp_tool.py	fix(mcp): seed protocol header before HTTP initialize	2026-04-23 22:01:24 -07:00
test_mcp_tool_401_handling.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
test_mcp_tool_issue_948.py	fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness	2026-04-04 10:18:57 -07:00
test_memory_tool.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
test_memory_tool_import_fallback.py	fix(tools): keep memory tool available when fcntl is unavailable	2026-04-14 10:18:05 -07:00
test_mixture_of_agents_tool.py	chore(release): map devorun author + convert MoA defaults test to invariant	2026-04-23 15:14:11 -07:00
test_modal_bulk_upload.py	perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive (#8014 )	2026-04-12 06:18:05 +05:30
test_modal_sandbox_fixes.py	fix: update tests for unified spawn-per-call execution model	2026-04-08 17:23:15 -07:00
test_modal_snapshot_isolation.py	fix(tests): update mocks for file sync changes	2026-04-10 03:01:46 -07:00
test_notify_on_complete.py	fix: suppress duplicate completion notifications when agent already consumed output via wait/poll/log (#8228 )	2026-04-12 00:36:22 -07:00
test_osv_check.py	feat: OSV malware check for MCP extension packages (#5305 )	2026-04-05 12:46:07 -07:00
test_parse_env_var.py	guard terminal_tool import-time env parsing	2026-04-22 14:45:50 -07:00
test_patch_parser.py	fix(patch): harden V4A patch parser and fuzzy match — 9 correctness bugs	2026-04-10 16:47:44 -07:00
test_process_registry.py	fix(gateway): propagate user identity through process watcher pipeline	2026-04-11 13:46:16 -07:00
test_read_loop_detection.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
test_registry.py	fix(tests): unstick CI — sweep stale tests from recent merges (#12670 )	2026-04-19 12:39:58 -07:00
test_resolve_path.py	fix(file_tools): resolve bookkeeping paths against live terminal cwd	2026-04-23 15:11:52 -07:00
test_rl_training_tool.py	fix: call _stop_training_run on early-return failure paths	2026-03-10 17:09:51 -07:00
test_search_hidden_dirs.py	fix: exclude hidden directories from find/grep search backends (#1558 )	2026-03-17 02:02:57 -07:00
test_send_message_missing_platforms.py	fix(send_message): deliver Matrix media via adapter	2026-04-15 17:37:43 -07:00
test_send_message_tool.py	fix(send_message): accept E.164 phone numbers for signal/sms/whatsapp (#12936 )	2026-04-20 03:02:44 -07:00
test_session_search.py	fix(aux): add session_search extra_body and concurrency controls	2026-04-20 00:47:39 -07:00
test_signal_media.py	feat(send_message): add media delivery support for Signal	2026-04-20 13:24:15 -07:00
test_singularity_preflight.py	fix(tests): use case-insensitive regex in singularity preflight tests	2026-03-16 19:01:39 +03:00
test_skill_env_passthrough.py	fix: remove 115 verified dead code symbols across 46 production files	2026-04-10 03:44:43 -07:00
test_skill_improvements.py	feat(skills): size limits for agent writes + fuzzy matching for patch (#4414 )	2026-04-01 04:19:19 -07:00
test_skill_manager_tool.py	feat(skills-guard): gate agent-created scanner on config.skills.guard_agent_created (default off)	2026-04-23 06:20:47 -07:00
test_skill_size_limits.py	feat(skills): size limits for agent writes + fuzzy matching for patch (#4414 )	2026-04-01 04:19:19 -07:00
test_skill_view_path_check.py	refactor: use Path.is_relative_to() for skill_view boundary check	2026-03-04 05:30:43 -08:00
test_skill_view_traversal.py	fix(security): block path traversal in skill_view file_path (fixes #220 )	2026-03-02 02:00:09 -08:00
test_skills_guard.py	feat(skills-guard): gate agent-created scanner on config.skills.guard_agent_created (default off)	2026-04-23 06:20:47 -07:00
test_skills_hub.py	fix: update 6 test files broken by dead code removal	2026-04-10 03:44:43 -07:00
test_skills_hub_clawhub.py	fix: improve clawhub skill search matching	2026-03-14 23:15:04 -07:00
test_skills_sync.py	feat(skills_sync): surface collision with reset-hint	2026-04-23 05:09:08 -07:00
test_skills_tool.py	fix(skills): follow symlinked category dirs consistently	2026-04-23 14:05:47 -07:00
test_ssh_bulk_upload.py	perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive (#8014 )	2026-04-12 06:18:05 +05:30
test_ssh_environment.py	fix(tools): keep SSH ControlMaster socket path under macOS 104-byte limit	2026-04-20 03:07:32 -07:00
test_symlink_prefix_confusion.py	fix: use is_relative_to() for symlink boundary check in skills_guard	2026-03-04 17:23:23 +03:00
test_sync_back_backends.py	fix: harden sync_back — PID-suffix temp path, size cap, lifecycle guards	2026-04-16 19:39:21 -07:00
test_terminal_compound_background.py	fix(terminal): rewrite `A && B &` to `A && { B & }` to prevent subshell leak	2026-04-19 16:53:11 -07:00
test_terminal_exit_semantics.py	feat: add exit code context for common CLI tools in terminal results (#5144 )	2026-04-04 16:57:24 -07:00
test_terminal_foreground_timeout_cap.py	terminal: steer long-lived server commands to background mode	2026-04-19 16:47:20 -07:00
test_terminal_none_command_guard.py	fix(terminal): guard invalid command values	2026-04-08 21:37:51 -07:00
test_terminal_output_transform_hook.py	test: stop testing mutable data — convert change-detectors to invariants (#13363 )	2026-04-20 23:20:33 -07:00
test_terminal_requirements.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
test_terminal_timeout_output.py	fix(terminal): preserve partial output when command times out (#3868 )	2026-03-29 21:51:44 -07:00
test_terminal_tool.py	fix terminal workdir validation for Windows paths	2026-04-15 15:06:51 -07:00
test_terminal_tool_pty_fallback.py	feat: add tested Termux install path and EOF-aware gh auth	2026-04-09 16:24:53 -07:00
test_terminal_tool_requirements.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
test_threaded_process_handle.py	feat(environments): unified spawn-per-call execution layer	2026-04-08 17:23:15 -07:00
test_tirith_security.py	fix: send_animation metadata, MarkdownV2 inline code splitting, tirith cosign-free install (#1626 )	2026-03-16 23:39:41 -07:00
test_todo_tool.py	fix(tools): enforce ID uniqueness in TODO store during replace operations	2026-04-11 16:22:50 -07:00
test_tool_backend_helpers.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
test_tool_call_parsers.py	refactor(tests): re-architect tests + fix CI failures (#5946 )	2026-04-07 17:19:07 -07:00
test_tool_output_limits.py	feat(skills): add design-md skill for Google's DESIGN.md spec (#14876 )	2026-04-23 21:51:19 -07:00
test_tool_result_storage.py	fix(tools): neutralize shell injection in _write_to_sandbox via path quoting (#7940 )	2026-04-11 14:26:11 -07:00
test_transcription.py	fix(stt): map cloud-only model names to valid local size for faster-whisper (#2544 )	2026-04-20 05:18:48 -07:00
test_transcription_tools.py	review(stt-xai): address cetej's nits	2026-04-23 01:57:33 -07:00
test_tts_gemini.py	feat(tts): add Google Gemini TTS provider (#11229 )	2026-04-16 14:23:16 -07:00
test_tts_kittentts.py	feat(tts): complete KittenTTS integration (tools/setup/docs/tests)	2026-04-21 01:28:32 -07:00
test_tts_max_text_length.py	fix(tts): use per-provider input-character caps instead of global 4000 (#13743 )	2026-04-21 17:49:39 -07:00
test_tts_mistral.py	test: remove 8 flaky tests that fail under parallel xdist scheduling (#12784 )	2026-04-19 19:38:02 -07:00
test_tts_speed.py	test(tts): add speed config tests for Edge, OpenAI, and MiniMax	2026-04-12 16:46:18 -07:00
test_url_safety.py	feat(security): add global toggle to allow private/internal URL resolution	2026-04-22 14:38:59 -07:00
test_vision_tools.py	test: cover vision config temperature wiring\n\n- add regression tests for auxiliary.vision.temperature and timeout\n- add bugkill3r to AUTHOR_MAP for the salvaged commit	2026-04-20 00:32:09 -07:00
test_voice_cli_integration.py	feat(voice): add cli beep toggle	2026-04-21 00:29:29 -07:00
test_voice_mode.py	fix(termux): tighten voice setup and mobile chat UX	2026-04-09 16:24:53 -07:00
test_watch_patterns.py	fix(gateway): route synthetic background events by session	2026-04-15 11:16:01 -07:00
test_web_tools_config.py	test: remove 169 change-detector tests across 21 files (#11472 )	2026-04-17 01:05:09 -07:00
test_web_tools_tavily.py	fix(tests): fix several failing/flaky tests on main (#6777 )	2026-04-09 13:17:06 -07:00
test_website_policy.py	fix: resolve 7 failing CI tests (#3936 )	2026-03-30 08:10:14 -07:00
test_windows_compat.py	fix: guard POSIX-only process functions for Windows compatibility	2026-03-01 01:54:27 +03:00
test_write_deny.py	fix: resolve symlink bypass in write deny list on macOS	2026-02-26 13:30:55 +03:00
test_yolo_mode.py	fix(gateway): scope /yolo to the active session	2026-04-10 03:38:44 -07:00
test_zombie_process_cleanup.py	fix(tests): fix 78 CI test failures and remove dead test (#9036 )	2026-04-13 10:50:24 -07:00