hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-03 12:23:08 +00:00

History

Teknium 9d42c2c286 feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 ) * feat(video_gen): unified video_generate tool with pluggable provider backends One core video_generate tool, every backend a plugin. Mirrors the image_gen + memory_provider + context_engine architecture: ABC, registry, plugin-context registration hook, and per-plugin model catalogs surfaced through hermes tools. Surface (one schema, every backend): - operation: generate / edit / extend - modalities: text-to-video (prompt only), image-to-video (prompt + image_url), video edit (prompt + video_url), video extend (video_url) - reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model override - Providers ignore unknown kwargs and declare what they support via VideoGenProvider.capabilities() — backend-specific quirks stay in the backend, the agent learns one tool Backends shipped: - plugins/video_gen/xai/ — Grok-Imagine, full generate/edit/extend + image-to-video + reference images (salvaged from PR #10600 by @Jaaneek, reshaped into the plugin interface) - plugins/video_gen/fal/ — Veo 3.1 (t2v + i2v), Kling O3 i2v, Pixverse v6 i2v with model-aware payload building that drops keys a model doesn't declare Wiring: - agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation, success_response / error_response, save_b64_video / save_bytes_video, $HERMES_HOME/cache/videos/ - agent/video_gen_registry.py — thread-safe register/get/list + get_active_provider() reading video_gen.provider from config.yaml - hermes_cli/plugins.py — PluginContext.register_video_gen_provider() - hermes_cli/tools_config.py — Video Generation category in hermes tools, plugin-only providers list, model picker per plugin, config write to video_gen.{provider,model} - toolsets.py — new video_gen toolset - tests: 31 new tests covering ABC, registry, tool dispatch, both plugins - docs: developer-guide/video-gen-provider-plugin.md (parallel to the image-gen guide), sidebar + toolsets-reference + plugin guides updated Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse), #10458 (provider categories), #10786 (xAI media+search bundle), #2984 (FAL duplicate), #19086 (Google Veo standalone — easy port to plugin interface). Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): dynamic schema reflects active backend's capabilities Address the 'capability variance' question — instead of one tool with a static schema that lies about what every backend supports, the video_generate tool now rebuilds its description at get_definitions() time based on the configured video_gen.provider and video_gen.model. The agent sees backend-specific guidance up-front: - 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is REQUIRED; text-only prompts will be rejected' - 'fal-ai/veo3.1' (t2v): no image_url restriction shown - xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7 reference_image_urls' - Backends without edit/extend: 'not supported on this backend — surface that they need to switch backends via hermes tools' This is the same pattern PR #22694 used for delegate_task self-capping — documented in the dynamic-tool-schemas skill. Cache invalidation is free: get_tool_definitions() already memoizes on config.yaml mtime, so a mid-session backend swap rebuilds the schema automatically. Tested: - Empirical FAL OpenAPI schema check confirms image-to-video models require image_url (FAL returns HTTP 422 otherwise) — client-side rejection in FALVideoGenProvider.generate() now prevents the wasted round-trip - Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches - 6 new tests cover the builder (no config / image-only / full-surface / text-only / unknown provider / registry wiring), all passing - 37/37 in the slice, 134/134 in the broader regression set * test(video_gen/xai): full surface integration tests + cleaner schema Verified end-to-end that the xAI plugin handles every documented mode from PR #10600's surface: text-to-video, image-to-video, reference-images-to-video, video edit, video extend (with and without prompt). All five modes route to the correct xAI endpoint (/videos/generations, /videos/edits, /videos/extensions) with the right payload shape (image / reference_images / video keys), and all five client-side rejections fire before the network: edit-without-prompt, extend-without-video_url, image+refs conflict, >7 references, and duration/aspect_ratio clamping. 15 new integration tests grouped into four classes (endpoint routing, modalities, validation, clamping). httpx is stubbed via a small fake AsyncClient that records POSTs so the tests assert the actual payload the plugin would send to xAI — not just the success/error envelope. Also cleaned up a description redundancy: when a model's operations match the backend's overall set, we no longer print the duplicate 'operations supported by this model' line. xAI's description now reads: Active backend: xAI . model: grok-imagine-video - operations supported by this backend: edit, extend, generate - modalities supported by this backend: image, reference_images, text - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16 - resolution choices: 480p, 720p - duration range: 1-15s - reference_image_urls: up to 7 images Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing Two design changes per Teknium: 1) Drop edit/extend from the tool surface entirely. Only text-to-video and image-to-video remain. The agent sees a clean tool with two modalities; backend-specific quirks like xAI's edit/extend endpoints stay out of the unified schema. 2) FAL: pick a model FAMILY once, the plugin routes between the family's text-to-video and image-to-video endpoints based on whether image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND 'fal-ai/veo3.1/image-to-video' as separate options — they pick 'veo3.1', and the plugin handles the rest. Catalog rewritten as families: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video / fal-ai/kling-video/o3/standard/image-to-video xAI uses a single endpoint (/videos/generations) for both modes, routed by the presence of the 'image' field in the payload — no edit/extend exposure. Schema changes: - VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params: prompt (required), image_url, reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model. - VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS, DEFAULT_OPERATION. capabilities() drops 'operations' key. - success_response: add 'modality' field ('text' \| 'image') so the agent and logs can see which endpoint was actually hit. Dynamic schema builder simplified — no operations bullet, no 'switch backends if you need edit/extend' guidance. When the active backend supports both modalities (the common case), description reads: Active backend: FAL . model: pixverse-v6 - supports both text-to-video (omit image_url) and image-to-video (pass image_url) - routes automatically - aspect_ratio choices: 16:9, 9:16, 1:1 - resolution choices: 360p, 540p, 720p, 1080p - duration range: 1-15s - audio: pass audio=true to enable native audio (pricing tier) - negative_prompt: supported Tests: 51 in the video_gen slice, 216 across the broader image+video sweep, all passing. New FAL routing tests prove pixverse-v6 + no image hits text-to-video endpoint, pixverse-v6 + image_url hits image-to-video endpoint, same for veo3.1 and kling-o3-standard. Docs updated: developer-guide page rewrites the 'model families' pattern as a first-class section so external plugin authors know the convention. toolsets-reference and toolsets.py descriptions match the new surface. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers Catalog now covers everything Teknium specced from FAL: Cheap tier: ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / image-to-video Premium tier: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video seedance-2.0 bytedance/seedance-2.0/text-to-video / image-to-video kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / image-to-video happy-horse fal-ai/happy-horse/text-to-video / image-to-video DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane defaults, both modalities) — better first-run UX for users who haven't explicitly picked a model. New family-entry knob: image_param_key. Kling v3 4K's image-to-video endpoint expects start_image_url instead of image_url; declaring image_param_key='start_image_url' on the family lets _build_payload remap correctly. Other families default to plain image_url. Per-family capability flags reflect each model's docs: - LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution enum exposed by FAL — let endpoint apply defaults) - Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported, negative prompts NOT supported per docs - Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative - Veo 3.1: unchanged, 16:9/9:16, 4/6/8s Tests: +5 covering the new families (full catalog, Kling 4K start_image_url remap, Seedance routing, LTX payload minimality, Happy Horse minimality). 56/56 in the slice green. Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes already has a direct xAI plugin that talks to xAI's own API; routing the same model through FAL's wrapper would duplicate the surface without adding capabilities. Users on FAL who want Grok-Imagine should use the xAI plugin directly; flag if you want both routes available. * test(video_gen): tool-surface routing matrix — every model x modality End-to-end matrix test driven through _handle_video_generate() — the actual function the agent's video_generate tool call lands in. Writes config.yaml, invokes the registered handler with a raw args dict, then asserts the outbound HTTP/SDK call hit the right endpoint with the right payload shape. Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new families as they're added (add a family to FAL_FAMILIES and you get both modalities tested for free). Coverage: - All 6 FAL families x {text-only, text+image} = 12 cases - xAI x {text-only, text+image} = 2 cases - tool-level model= arg overrides config = 2 cases For each case, verifies: - result['success'] is True - result['modality'] matches input shape ('text' if no image_url, 'image' otherwise) - outbound endpoint URL matches the family's text_endpoint or image_endpoint - text-only payloads carry no image-shaped keys - text+image payloads carry the family's image key (image_url for most, start_image_url for kling-v3-4k, wrapped 'image' object for xAI) All 16 cases passing. Confirms the tool surface routes every (provider, model, modality) combination correctly with zero leakage. * feat(video_gen): keep video_gen out of first-run setup, surface in status Two changes: 1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in the first-run toolset checklist. Video gen is niche, paid, and slow — most users don't want it nagging them during initial setup. Anyone who wants it opts in via 'hermes tools' -> Video Generation, which already routes to the provider+model picker. 2. The 'hermes setup' status panel learns about video_gen — but only shows the row when a plugin reports available. Users without FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of those keys see 'Video Generation (FAL) ✓' as confirmation it's wired. Verified live: - Fresh install (no creds): zero video_gen mentions in wizard. - With FAL_KEY: status row appears with active backend name. - 160/160 in the setup + tools_config + video_gen test slice. Rationale: image_gen is on by default because it's a featured creative tool used in casual chat (telegrams, etc). Video gen is heavier — long wait, paid per-second pricing. Default-off matches user intent better. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>		2026-05-13 16:39:41 -07:00
..
__init__.py
test_accretion_caps.py	fix(ci): recover 38 failing tests on main (#17642 )	2026-04-29 20:05:32 -07:00
test_ansi_strip.py	fix: strip ANSI at the source — clean terminal output before it reaches the model	2026-03-23 07:43:12 -07:00
test_approval.py	fix(approval): catch sudo with stdin/askpass/shell privilege flags	2026-05-11 06:56:30 -07:00
test_approval_heartbeat.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_approval_plugin_hooks.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_base_environment.py	fix(env): pass -- to cd for hyphen-prefixed workdirs	2026-05-04 04:45:03 -07:00
test_browser_camofox.py	fix(tests): resolve 17 persistent CI test failures (#15084 )	2026-04-24 03:46:46 -07:00
test_browser_camofox_persistence.py	feat(browser): support externally managed Camofox sessions	2026-05-12 15:14:49 -07:00
test_browser_camofox_state.py	feat(browser): support externally managed Camofox sessions	2026-05-12 15:14:49 -07:00
test_browser_cdp_override.py	Support browser CDP URL from config	2026-04-17 16:05:04 -07:00
test_browser_cdp_tool.py	fix(tests): resolve 17 persistent CI test failures (#15084 )	2026-04-24 03:46:46 -07:00
test_browser_chromium_check.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_browser_cleanup.py	fix(doctor): only check the active memory provider, not all providers unconditionally (#6285 )	2026-04-08 13:44:58 -07:00
test_browser_cloud_fallback.py	fix(browser): runtime fallback to local Chromium when cloud provider fails	2026-04-16 04:19:34 -07:00
test_browser_cloud_provider_cache.py	fix(browser_tool): fall through to autodetect on config read failure	2026-05-09 13:35:39 -07:00
test_browser_console.py	fix(browser): honor auxiliary.vision.temperature for screenshot analysis\n\n- mirror the vision tool's config bridge in browser_vision	2026-04-20 00:32:09 -07:00
test_browser_content_none_guard.py	fix(browser): guard LLM response content against None in snapshot and vision (#3642 )	2026-03-28 17:25:04 -07:00
test_browser_eval_supervisor_path.py	perf(browser): route browser_console eval through supervisor's persistent CDP WS (180x faster) (#23226 )	2026-05-10 07:37:55 -07:00
test_browser_hardening.py	fix(browser): hardening — dead code, caching, scroll perf, security, thread safety	2026-04-10 13:05:44 -07:00
test_browser_homebrew_paths.py	feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags	2026-05-08 14:27:40 -07:00
test_browser_hybrid_routing.py	feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136 )	2026-04-26 09:57:58 -07:00
test_browser_lightpanda.py	fix(browser): tighten Lightpanda fallback edge cases	2026-05-06 03:41:21 -07:00
test_browser_orphan_reaper.py	test: migrate stale os.kill monkeypatches to gateway.status._pid_exists	2026-05-08 14:27:40 -07:00
test_browser_secret_exfil.py	fix: rewrite test mock secrets and add redaction fixture	2026-04-01 12:03:56 -07:00
test_browser_ssrf_local.py	fix(browser): enforce cloud-metadata SSRF floor in hybrid routing (#16234 ) (#21228 )	2026-05-07 05:38:05 -07:00
test_browser_supervisor.py	perf(browser): route browser_console eval through supervisor's persistent CDP WS (180x faster) (#23226 )	2026-05-10 07:37:55 -07:00
test_browser_supervisor_healthcheck.py	test(browser_supervisor): cover cache-hit healthcheck on dead thread/loop	2026-04-30 20:33:33 -07:00
test_budget_config.py	test(tools): add unit tests for budget_config module	2026-04-11 02:58:48 -07:00
test_checkpoint_manager.py	fix(checkpoint): guard _touch_project against non-dict project metadata	2026-05-09 17:53:13 -07:00
test_clarify_gateway.py	feat(gateway): wire clarify tool with inline keyboard buttons on Telegram (#24199 )	2026-05-12 16:33:33 -07:00
test_clarify_tool.py
test_clipboard.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_code_execution.py	fix(windows): enable execute_code — stale AF_UNIX gate was blocking the tool	2026-05-08 14:27:40 -07:00
test_code_execution_modes.py	tests: skip POSIX-venv-layout tests on Windows	2026-05-08 14:27:40 -07:00
test_code_execution_windows_env.py	execute_code: set PYTHONIOENCODING=utf-8 + PYTHONUTF8=1 in child env	2026-05-08 14:27:40 -07:00
test_command_guards.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_computer_use.py	feat(computer-use): cua-driver backend, universal any-model schema	2026-05-08 11:07:38 -07:00
test_config_null_guard.py	fix: guard config.get() against YAML null values to prevent AttributeError (#3377 )	2026-03-27 04:03:00 -07:00
test_credential_files.py	fix: remove 115 verified dead code symbols across 46 production files	2026-04-10 03:44:43 -07:00
test_credential_pool_env_fallback.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_cron_approval_mode.py	fix(approval): cron jobs must not be treated as gateway context	2026-05-08 07:30:14 -07:00
test_cron_prompt_injection.py	fix: cron prompt injection scanner bypass for multi-word variants	2026-02-26 13:55:54 +03:00
test_cronjob_tools.py	fix(cron): allow quoted URL in github auth-header allowlist	2026-05-09 11:11:45 -07:00
test_daytona_environment.py	fix(daytona): migrate legacy-sandbox lookup to cursor-based list() (#24587 )	2026-05-12 16:31:46 -07:00
test_debug_helpers.py
test_delegate.py	fix(delegate): add explicit do-not-use guidance to acp_command/acp_args schema (carve-out of #22680 )	2026-05-09 13:37:30 -07:00
test_delegate_composite_toolsets.py	fix(delegate): expand composite toolsets before intersection in delegate_task	2026-05-07 06:41:42 -07:00
test_delegate_subagent_timeout_diagnostic.py	feat(delegate): diagnostic dump when a subagent times out with 0 API calls (#15105 )	2026-04-24 04:58:32 -07:00
test_delegate_toolset_scope.py
test_discord_tool.py	feat: add Discord message deletion action	2026-05-07 05:11:09 -07:00
test_docker_environment.py	feat(docker): run container as host user to avoid root-owned bind mounts	2026-04-29 16:16:43 +10:00
test_docker_find.py	feat: entry-level Podman support — find_docker() + rootless entrypoint (#10066 )	2026-04-14 21:20:37 -07:00
test_dockerfile_node_modules_perms.py	fix(docker): chown runtime node_modules trees to hermes user (#18800 )	2026-05-07 06:17:49 -07:00
test_dockerfile_pid1_reaping.py	test(docker): align Dockerfile contract tests with simplified TUI flow	2026-05-07 04:53:10 -07:00
test_env_passthrough.py	fix(env_passthrough): reject Hermes provider credentials from skill passthrough (#13523 )	2026-04-21 06:14:25 -07:00
test_feishu_tools.py	feat: add Feishu document comment intelligent reply with 3-tier access control	2026-04-17 19:04:11 -07:00
test_file_operations.py	fix(file-ops): allow file search in hidden roots	2026-05-04 12:37:09 -07:00
test_file_operations_edge_cases.py	feat(file_tools): post-write delta lint on write_file + patch, add JSON/YAML/TOML/Python in-process linters (#20191 )	2026-05-05 04:54:17 -07:00
test_file_ops_cwd_tracking.py	fix(file-ops): follow terminal env's live cwd in _exec instead of init-time cached cwd (#11912 )	2026-04-17 19:26:40 -07:00
test_file_read_guards.py	fix(file-tools): escalate to BLOCKED on repeated read_file dedup stubs (#16382 )	2026-04-27 00:17:26 -07:00
test_file_staleness.py	fix(file_tools): resolve bookkeeping paths against live terminal cwd	2026-04-23 15:11:52 -07:00
test_file_state_registry.py	feat(delegate): cross-agent file state coordination for concurrent subagents (#13718 )	2026-04-21 16:41:26 -07:00
test_file_sync.py	test(file_sync): add tests for bulk_upload_fn callback	2026-04-10 21:14:32 -07:00
test_file_sync_back.py	fix: move pytest.importorskip below pytest import in skip-guarded tests	2026-05-09 11:12:03 -07:00
test_file_sync_perf.py	test: add reproducible perf benchmark for file sync overhead	2026-04-10 03:01:46 -07:00
test_file_tools.py	test(patch-tool): collapse 9 schema-shape tests into 2 invariants	2026-05-08 16:59:24 -07:00
test_file_tools_container_config.py	fix(docker): pass docker_mount_cwd_to_workspace and docker_forward_env to container_config in file_tools	2026-04-20 00:58:16 -07:00
test_file_tools_live.py	feat(environments): unified spawn-per-call execution layer	2026-04-08 17:23:15 -07:00
test_file_write_safety.py	fix(file_tools): block /private/etc writes on macOS symlink bypass	2026-04-13 05:15:05 -07:00
test_force_dangerous_override.py	fix(skills): honor policy table for dangerous verdicts	2026-03-14 11:27:02 -07:00
test_fuzzy_match.py	fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage	2026-04-21 02:03:46 -07:00
test_hardline_blocklist.py	fix(terminal): block sudo -S password guessing when SUDO_PASSWORD is not set	2026-05-11 06:56:30 -07:00
test_heartbeat_stale_thresholds.py	test: add unit tests for heartbeat stale threshold increase	2026-05-04 05:08:51 -07:00
test_hidden_dir_filter.py
test_homeassistant_tool.py	fix: clean up description escaping, add string-data tests	2026-04-13 04:45:07 -07:00
test_image_generation.py	feat(image-gen): add GPT Image 2 to FAL catalog (#13677 )	2026-04-21 13:35:31 -07:00
test_image_generation_env.py	Normalize FAL_KEY env handling (ignore whitespace-only values)	2026-04-21 02:04:21 -07:00
test_image_generation_plugin_dispatch.py	fix(image-gen): force-refresh plugin providers in long-lived sessions	2026-04-23 03:01:18 -07:00
test_init_session_cwd_respect.py	fix(cli): respect terminal.cwd config in local terminal backend	2026-04-28 22:16:08 -07:00
test_interrupt.py	fix: resolve remaining 4 CI test failures (#9543 )	2026-04-14 02:18:38 -07:00
test_kanban_tools.py	fix(tools): clarify kanban_complete phantom-card retry guidance	2026-05-10 16:14:43 -07:00
test_lazy_deps.py	feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220 )	2026-05-12 01:02:25 -07:00
test_llm_content_none_guard.py	fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389 ) (#3449 )	2026-03-27 15:28:19 -07:00
test_local_background_child_hang.py	fix(environments): use incremental UTF-8 decoder in select-based drain	2026-04-19 11:27:50 -07:00
test_local_env_blocklist.py	feat: add Vercel Sandbox backend	2026-04-29 07:22:33 -07:00
test_local_env_cwd_recovery.py	fix(local): test root as ancestor candidate; use real pipe for fake stdout	2026-05-04 15:31:47 -07:00
test_local_interrupt_cleanup.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_local_shell_init.py	fix(terminal): auto-source ~/.profile and ~/.bash_profile so n/nvm PATH survives (#14534 )	2026-04-23 05:15:37 -07:00
test_local_tempdir.py
test_managed_browserbase_and_modal.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
test_managed_media_gateways.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
test_managed_modal_environment.py	fix: add activity heartbeats to prevent false gateway inactivity timeouts (#10501 )	2026-04-15 13:29:05 -07:00
test_managed_server_tool_support.py
test_managed_tool_gateway.py	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
test_mcp_cancelled_error_propagation.py	fix(mcp): re-raise CancelledError explicitly in MCPServerTask.run (#21318 )	2026-05-07 07:04:38 -07:00
test_mcp_circuit_breaker.py	test(mcp): add failing tests for circuit-breaker recovery	2026-04-21 05:19:03 -07:00
test_mcp_dynamic_discovery.py	fix(ci): recover 38 failing tests on main (#17642 )	2026-04-29 20:05:32 -07:00
test_mcp_empty_error_message.py	fix(mcp): include exception type in error messages when str(exc) is empty	2026-05-07 06:33:57 -07:00
test_mcp_image_content.py	fix(mcp): surface image tool results as MEDIA tags instead of dropping them (#21328 )	2026-05-07 07:14:16 -07:00
test_mcp_oauth.py	fix(security): close TOCTOU window when saving MCP OAuth credentials	2026-05-07 04:56:13 -07:00
test_mcp_oauth_bidirectional.py	fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025 ) (#12717 )	2026-04-19 16:31:07 -07:00
test_mcp_oauth_cold_load_expiry.py	fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025 ) (#12717 )	2026-04-19 16:31:07 -07:00
test_mcp_oauth_integration.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
test_mcp_oauth_manager.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
test_mcp_oauth_metadata.py	fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226 )	2026-05-07 05:35:33 -07:00
test_mcp_probe.py	fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness	2026-04-04 10:18:57 -07:00
test_mcp_reconnect_signal.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
test_mcp_sse_transport.py	fix(mcp): forward OAuth auth and bump sse_read_timeout on SSE transport (#21323 )	2026-05-07 07:08:04 -07:00
test_mcp_stability.py	test: migrate stale os.kill monkeypatches to gateway.status._pid_exists	2026-05-08 14:27:40 -07:00
test_mcp_structured_content.py	fix(ci): recover 38 failing tests on main (#17642 )	2026-04-29 20:05:32 -07:00
test_mcp_tool.py	fix(mcp): report configured timeout in MCP call errors	2026-05-07 06:28:11 -07:00
test_mcp_tool_401_handling.py	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 )	2026-04-16 21:57:10 -07:00
test_mcp_tool_issue_948.py	fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness	2026-04-04 10:18:57 -07:00
test_mcp_tool_session_expired.py	fix(mcp): retry stale pipe transport failures	2026-05-07 06:32:45 -07:00
test_mcp_utility_capability_gating.py	fix(mcp): gate utility stubs on server-advertised capabilities (#21347 )	2026-05-07 07:39:50 -07:00
test_memory_tool.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
test_memory_tool_import_fallback.py	fix(tools): keep memory tool available when fcntl is unavailable	2026-04-14 10:18:05 -07:00
test_memory_tool_schema.py	fix(memory): remove dead allOf schema block at the source	2026-05-07 07:03:21 -07:00
test_microsoft_graph_auth.py	test(msgraph): cover concurrent token cache reuse	2026-05-08 09:27:26 -07:00
test_microsoft_graph_client.py	fix(msgraph): stream download_to_file body instead of buffering	2026-05-08 09:27:26 -07:00
test_mixture_of_agents_tool.py	chore(release): map devorun author + convert MoA defaults test to invariant	2026-04-23 15:14:11 -07:00
test_modal_bulk_upload.py	perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive (#8014 )	2026-04-12 06:18:05 +05:30
test_modal_sandbox_fixes.py	feat: add Vercel Sandbox backend	2026-04-29 07:22:33 -07:00
test_modal_snapshot_isolation.py	fix(tests): update mocks for file sync changes	2026-04-10 03:01:46 -07:00
test_notify_on_complete.py	fix: suppress duplicate completion notifications when agent already consumed output via wait/poll/log (#8228 )	2026-04-12 00:36:22 -07:00
test_osv_check.py	feat: OSV malware check for MCP extension packages (#5305 )	2026-04-05 12:46:07 -07:00
test_parse_env_var.py	guard terminal_tool import-time env parsing	2026-04-22 14:45:50 -07:00
test_patch_parser.py
test_process_registry.py	fix(process_registry): kill orphaned Popen on post-spawn setup failure	2026-05-09 17:53:24 -07:00
test_read_loop_detection.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
test_registry.py	fix(computer-use): harden image-rejection fallback + AUTHOR_MAP	2026-05-08 11:07:38 -07:00
test_resolve_path.py	fix(file_tools): resolve bookkeeping paths against live terminal cwd	2026-04-23 15:11:52 -07:00
test_rl_training_tool.py
test_schema_sanitizer.py	fix: strip Codex-hostile top-level schema combinators	2026-05-07 07:03:21 -07:00
test_search_hidden_dirs.py	fix: exclude hidden directories from find/grep search backends (#1558 )	2026-03-17 02:02:57 -07:00
test_send_message_missing_platforms.py	fix(send_message): deliver Matrix media via adapter	2026-04-15 17:37:43 -07:00
test_send_message_tool.py	chore: remove unused sentinel in test_send_message_tool	2026-05-11 06:44:58 -07:00
test_session_search.py	fix: make session search initialize session db	2026-05-09 14:36:58 -07:00
test_shared_container_task_id.py	feat(terminal): collapse subagent task_ids to shared container (#16177 )	2026-04-26 11:55:02 -07:00
test_signal_media.py	feat(send_message): add media delivery support for Signal	2026-04-20 13:24:15 -07:00
test_singularity_preflight.py
test_skill_env_passthrough.py	fix: remove 115 verified dead code symbols across 46 production files	2026-04-10 03:44:43 -07:00
test_skill_improvements.py	feat(skills): size limits for agent writes + fuzzy matching for patch (#4414 )	2026-04-01 04:19:19 -07:00
test_skill_manager_tool.py	fix(skills): pin protects against deletion only, not edits (#20220 )	2026-05-05 05:43:10 -07:00
test_skill_provenance.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_skill_size_limits.py	feat(skills): size limits for agent writes + fuzzy matching for patch (#4414 )	2026-04-01 04:19:19 -07:00
test_skill_usage.py	fix(skills): lock usage telemetry updates	2026-05-07 06:13:37 -07:00
test_skill_view_path_check.py
test_skill_view_traversal.py	fix(security): block path traversal in skill_view file_path (fixes #220 )	2026-03-02 02:00:09 -08:00
test_skills_guard.py	feat(skills-guard): gate agent-created scanner on config.skills.guard_agent_created (default off)	2026-04-23 06:20:47 -07:00
test_skills_hub.py	fix(skills-hub): cover remaining SSRF fetch paths after #10029	2026-05-09 17:52:12 -07:00
test_skills_hub_clawhub.py	fix(skills-hub): cover remaining SSRF fetch paths after #10029	2026-05-09 17:52:12 -07:00
test_skills_sync.py	feat(skills_sync): surface collision with reset-hint	2026-04-23 05:09:08 -07:00
test_skills_tool.py	fix(tools): refuse skill_view name collisions instead of guessing	2026-05-13 13:29:28 -07:00
test_slash_confirm.py	feat(gateway,cli): confirm /reload-mcp to warn about prompt cache invalidation	2026-04-29 21:56:47 -07:00
test_spotify_client.py	refactor(spotify): convert to built-in bundled plugin under plugins/spotify (#15174 )	2026-04-24 07:06:11 -07:00
test_ssh_bulk_upload.py	test(ssh): update tar pipe assertion for --no-overwrite-dir	2026-04-30 04:32:28 -07:00
test_ssh_environment.py	fix(tools): keep SSH ControlMaster socket path under macOS 104-byte limit	2026-04-20 03:07:32 -07:00
test_symlink_prefix_confusion.py	fix: use is_relative_to() for symlink boundary check in skills_guard	2026-03-04 17:23:23 +03:00
test_sync_back_backends.py	fix: harden sync_back — PID-suffix temp path, size cap, lifecycle guards	2026-04-16 19:39:21 -07:00
test_terminal_compound_background.py	fix(terminal): rewrite `A && B &` to `A && { B & }` to prevent subshell leak	2026-04-19 16:53:11 -07:00
test_terminal_config_env_sync.py	fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV	2026-05-09 17:53:35 -07:00
test_terminal_exit_semantics.py	feat: add exit code context for common CLI tools in terminal results (#5144 )	2026-04-04 16:57:24 -07:00
test_terminal_foreground_timeout_cap.py	terminal: steer long-lived server commands to background mode	2026-04-19 16:47:20 -07:00
test_terminal_none_command_guard.py	fix(terminal): guard invalid command values	2026-04-08 21:37:51 -07:00
test_terminal_output_transform_hook.py	test: stop testing mutable data — convert change-detectors to invariants (#13363 )	2026-04-20 23:20:33 -07:00
test_terminal_requirements.py	feat: add Vercel Sandbox backend	2026-04-29 07:22:33 -07:00
test_terminal_task_cwd.py	fix(acp): honor task cwd for foreground terminal commands	2026-05-09 14:46:34 -07:00
test_terminal_timeout_output.py	fix(terminal): preserve partial output when command times out (#3868 )	2026-03-29 21:51:44 -07:00
test_terminal_tool.py	fix(terminal): skip sudo prompt when local NOPASSWD sudo works	2026-04-30 20:38:09 -07:00
test_terminal_tool_pty_fallback.py	feat: add tested Termux install path and EOF-aware gh auth	2026-04-09 16:24:53 -07:00
test_terminal_tool_requirements.py	feat: add Vercel Sandbox backend	2026-04-29 07:22:33 -07:00
test_threaded_process_handle.py
test_tirith_security.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_todo_tool.py	fix(tools): enforce ID uniqueness in TODO store during replace operations	2026-04-11 16:22:50 -07:00
test_tool_backend_helpers.py	fix(cli): coerce use_gateway config flags in tool routing	2026-04-26 19:02:55 -07:00
test_tool_call_parsers.py	refactor(tests): re-architect tests + fix CI failures (#5946 )	2026-04-07 17:19:07 -07:00
test_tool_output_limits.py	feat(skills): add design-md skill for Google's DESIGN.md spec (#14876 )	2026-04-23 21:51:19 -07:00
test_tool_result_storage.py	fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913 )	2026-05-09 18:44:58 -07:00
test_transcription.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_transcription_dotenv_fallback.py	fix(deps): unbreak [all] install — drop mistralai while PyPI quarantined (#24205 )	2026-05-11 23:02:15 -07:00
test_transcription_tools.py	fix(security): reduce unnecessary shell=True in subprocess calls	2026-05-13 10:31:22 -07:00
test_tts_command_providers.py	feat(tts): add Piper as a native local TTS provider (closes #8508 ) (#17885 )	2026-04-30 02:53:20 -07:00
test_tts_dotenv_fallback.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_tts_gemini.py	feat(tts): add Google Gemini TTS provider (#11229 )	2026-04-16 14:23:16 -07:00
test_tts_kittentts.py	feat(tts): complete KittenTTS integration (tools/setup/docs/tests)	2026-04-21 01:28:32 -07:00
test_tts_max_text_length.py	fix(tts): use per-provider input-character caps instead of global 4000 (#13743 )	2026-04-21 17:49:39 -07:00
test_tts_mistral.py	fix(deps): unbreak [all] install — drop mistralai while PyPI quarantined (#24205 )	2026-05-11 23:02:15 -07:00
test_tts_piper.py	feat(tts): add Piper as a native local TTS provider (closes #8508 ) (#17885 )	2026-04-30 02:53:20 -07:00
test_tts_speed.py	fix(tts): update MiniMax API endpoint to v1/text_to_speech	2026-05-04 12:36:09 -07:00
test_url_safety.py	fix(browser): enforce cloud-metadata SSRF floor in hybrid routing (#16234 ) (#21228 )	2026-05-07 05:38:05 -07:00
test_vercel_sandbox_environment.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_video_analyze.py	feat: add video_analyze tool for native video understanding (#19301 )	2026-05-04 00:04:36 +05:30
test_video_generation_dispatch.py	feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 )	2026-05-13 16:39:41 -07:00
test_video_generation_dynamic_schema.py	feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 )	2026-05-13 16:39:41 -07:00
test_video_generation_tool_surface_matrix.py	feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 )	2026-05-13 16:39:41 -07:00
test_vision_native_fast_path.py	fix(dashboard): UI polish — modals, layout, consistency, test fixes	2026-05-12 13:59:22 -04:00
test_vision_tools.py	test: cover vision config temperature wiring\n\n- add regression tests for auxiliary.vision.temperature and timeout\n- add bugkill3r to AUTHOR_MAP for the salvaged commit	2026-04-20 00:32:09 -07:00
test_voice_cli_integration.py	fix(cli): avoid voice TTS restart race	2026-05-04 01:36:07 -07:00
test_voice_mode.py	fix(termux): tighten voice setup and mobile chat UX	2026-04-09 16:24:53 -07:00
test_watch_patterns.py	fix(terminal): three-layer defense against watch_patterns notification spam (#15642 )	2026-04-25 06:41:58 -07:00
test_web_providers.py	refactor(web): per-capability backend selection for search/extract split	2026-05-06 09:16:25 -07:00
test_web_providers_brave_free.py	feat(web): add Brave Search (free tier) and DDGS search providers	2026-05-07 09:59:17 -07:00
test_web_providers_ddgs.py	feat(web): add Brave Search (free tier) and DDGS search providers	2026-05-07 09:59:17 -07:00
test_web_providers_searxng.py	feat(web): add SearXNG as a native search-only backend	2026-05-06 10:05:29 -07:00
test_web_tools_config.py	✨ feat(web): expose search result limit	2026-04-28 02:09:30 -07:00
test_web_tools_tavily.py	fix(tests): fix several failing/flaky tests on main (#6777 )	2026-04-09 13:17:06 -07:00
test_website_policy.py	fix: resolve 7 failing CI tests (#3936 )	2026-03-30 08:10:14 -07:00
test_windows_compat.py	fix: guard POSIX-only process functions for Windows compatibility	2026-03-01 01:54:27 +03:00
test_windows_native_support.py	feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220 )	2026-05-12 01:02:25 -07:00
test_write_deny.py	fix(tests): resolve 17 persistent CI test failures (#15084 )	2026-04-24 03:46:46 -07:00
test_yolo_mode.py	fix(approval): harden YOLO mode env parsing against quoted-bool strings	2026-04-30 20:37:37 -07:00
test_zombie_process_cleanup.py	fix(tests): resolve 17 persistent CI test failures (#15084 )	2026-04-24 03:46:46 -07:00