hermes-agent/tests/tools/test_video_generation_tool_surface_matrix.py
ethernet 48be2e0e4d
test: use subprocesses for each test file (#29016)
* ci(tests): install ripgrep from prebuilt tarball instead of apt

apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.

- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
  published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
  every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.

* fix(cli): compile syntax check to tempdir, not source __pycache__

`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:

1. Parallel test workers walking the same source tree (e.g. running
   the suite under per-file process isolation) can race against each
   other on the `__pycache__` write — manifests as flaky 'directory
   not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
   that the next interpreter run might pick up — fine when the
   interpreter version matches, sketchy if it doesn't.

Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.

* test(runner): per-file process isolation, drop manual state reset + xdist

Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.

Key changes:
  * scripts/run_tests_parallel.py — new runner: discovers test files,
    runs N in parallel via ThreadPoolExecutor, captures stdout per file,
    treats exit code 5 (no tests collected) as pass, kills all children
    on exit. Change from cpu_count to cpu_count*2. The runner is
    I/O-bound (waiting on subprocess.communicate() from pytest children)
    The parent process does almost no CPU work, so 2x oversubscription
    keeps more pipes full. When a file fails, immediately show the last
    30 lines of pytest output (stack traces + FAILED summary) plus a
    ready-to-copy repro command:
      python -m pytest tests/agent/test_auxiliary_client.py
  * scripts/run_tests.sh — delegates to run_tests_parallel.py
  * .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
  * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
  * tests/conftest.py — remove ~200 lines of manual state-reset fixtures
  * AGENTS.md — update Testing section for per-file design

* test(runner): speed gateway test antipattern scan up

* fix(test): web search provider plugin test missing xai

* fix(tests): make 14 test files pass under per-file subprocess isolation

Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:

Tool registry not populated:
  - test_video_generation_tool_surface_matrix: add discover_builtin_tools()
  - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
    registering all 8 bundled web providers, reset after each test
  - test_website_policy: same provider registration pattern
  - test_web_tools_tavily: same pattern across 3 dispatch test classes
  - Also add is_safe_url/check_website_access mocks where SSRF check
    blocks example.com (DNS resolution fails in isolated envs)

Stale check_fn cache:
  - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
    in both kanban guidance tests (prior test cached False for kanban_show)
  - test_discord_tool: cache invalidation in setup/teardown
  - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries

Module-level state pollution:
  - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
  - test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
    (ContextVar takes precedence over os.environ)
  - test_dm_topics: overwrite sys.modules + separate telegram.constants mock
    + force-reimport of gateway.platforms.telegram
  - test_terminal_tool_requirements: removed duplicate class declaration,
    autouse _clear_caches fixture

* change(tests): run_tests.sh explicitly includes env vars

instead of manually dropping some vars, now we just only include some

* fix(tests): 5 more isolation/NixOS fixes

- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
  command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
  (feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
  /etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
  (doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
  profile deletion when copytree preserved read-only modes from
  nix store

* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client

* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor

* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test

* fix: address PR #29016 review feedback

- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
  shared helper (register_all_web_providers), replacing 8 copy-pasted
  blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
  fix worker count to match code (cpu_count*2)

* fix: eliminate race in stale-cache achievements test

The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
2026-05-21 16:40:04 +05:30

255 lines
10 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""Tool-surface routing matrix: every (provider, model, modality) combo.
This is the integration test for the question Teknium asked: regardless
of which provider+model the user picks and whether they pass an
image_url or not, does the tool surface route correctly to the right
endpoint with the right payload shape?
Drives ``_handle_video_generate(args)`` end-to-end — config write →
config read → registry lookup → provider.generate() → outbound HTTP/SDK
call. Stubs fal_client and httpx so we observe routing without hitting
the network.
"""
from __future__ import annotations
import asyncio
import json
import types
from typing import Any, Dict, List, Optional
import pytest
import yaml
@pytest.fixture(autouse=True)
def _reset_registry():
from agent import video_gen_registry
video_gen_registry._reset_for_tests()
yield
video_gen_registry._reset_for_tests()
@pytest.fixture
def matrix_env(tmp_path, monkeypatch):
"""Set up HERMES_HOME, stub fal_client + httpx, force plugin discovery."""
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
monkeypatch.setenv("FAL_KEY", "test-key")
monkeypatch.setenv("XAI_API_KEY", "test-key")
fal_calls: List[Dict[str, Any]] = []
xai_calls: List[Dict[str, Any]] = []
# fal_client stub
fake_fal = types.ModuleType("fal_client")
def _subscribe(endpoint, arguments=None, with_logs=False):
fal_calls.append({"endpoint": endpoint, "arguments": arguments})
return {"video": {"url": f"https://fake-fal/{endpoint.replace('/','_')}.mp4"}}
fake_fal.subscribe = _subscribe # type: ignore
monkeypatch.setitem(__import__("sys").modules, "fal_client", fake_fal)
# httpx stub for xAI
import httpx
class _Resp:
def __init__(self, p, s=200):
self.status_code = s
self._p = p
self.text = json.dumps(p)
def raise_for_status(self):
if self.status_code >= 400:
raise httpx.HTTPStatusError("err", request=None, response=self) # type: ignore
def json(self):
return self._p
class _Client:
async def __aenter__(self): return self
async def __aexit__(self, *a): return None
async def post(self, url, headers=None, json=None, timeout=None):
xai_calls.append({"url": url, "json": json})
return _Resp({"request_id": "req-1"})
async def get(self, url, headers=None, timeout=None):
return _Resp({
"status": "done",
"video": {"url": "https://xai-cdn/out.mp4", "duration": 8},
"model": "grok-imagine-video",
})
import plugins.video_gen.xai as xai_plugin
monkeypatch.setattr(xai_plugin.httpx, "AsyncClient", lambda: _Client())
async def _no_sleep(*a, **k): return None
monkeypatch.setattr(asyncio, "sleep", _no_sleep)
# Reset FAL plugin's lazy fal_client cache so it picks up the stub
from plugins.video_gen import fal as fal_plugin
fal_plugin._fal_client = None
# Force discovery
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered(force=True)
return tmp_path, fal_calls, xai_calls
def _invoke_tool(home, cfg: dict, args: dict) -> dict:
"""Write config, invoke the registered tool handler, return parsed JSON."""
(home / "config.yaml").write_text(yaml.safe_dump(cfg))
import hermes_cli.config as cfg_mod
if hasattr(cfg_mod, "_invalidate_load_config_cache"):
cfg_mod._invalidate_load_config_cache()
from tools.registry import discover_builtin_tools, registry
if "video_generate" not in registry._tools:
discover_builtin_tools()
handler = registry._tools["video_generate"].handler
return json.loads(handler(args))
# ─────────────────────────────────────────────────────────────────────────
# FAL: every family × {text-only, text+image}
# ─────────────────────────────────────────────────────────────────────────
# We parametrize over the catalog so the test discovers new families
# automatically. If someone adds 'sora-2' to FAL_FAMILIES, this matrix
# picks it up — no test changes needed beyond confirming the endpoints.
def _all_fal_families():
from plugins.video_gen.fal import FAL_FAMILIES
return list(FAL_FAMILIES.keys())
@pytest.mark.parametrize("family_id", _all_fal_families())
def test_fal_text_only_routes_to_text_endpoint(matrix_env, family_id):
home, fal_calls, _ = matrix_env
from plugins.video_gen.fal import FAL_FAMILIES
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": family_id}},
{"prompt": "a dog running"},
)
assert result["success"] is True, f"{family_id}: {result.get('error')}"
assert result["modality"] == "text"
assert result["provider"] == "fal"
# Outbound endpoint must be the family's text endpoint
assert len(fal_calls) == 1
endpoint = fal_calls[0]["endpoint"]
assert endpoint == FAL_FAMILIES[family_id]["text_endpoint"]
# Payload must NOT contain any image-shaped key
payload = fal_calls[0]["arguments"] or {}
image_keys = [k for k in payload if "image" in k and "url" in k]
assert not image_keys, f"{family_id} text-only leaked image keys: {image_keys}"
@pytest.mark.parametrize("family_id", _all_fal_families())
def test_fal_text_plus_image_routes_to_image_endpoint(matrix_env, family_id):
home, fal_calls, _ = matrix_env
from plugins.video_gen.fal import FAL_FAMILIES
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": family_id}},
{"prompt": "animate this dog", "image_url": "https://example.com/dog.png"},
)
assert result["success"] is True, f"{family_id}: {result.get('error')}"
assert result["modality"] == "image"
assert result["provider"] == "fal"
# Outbound endpoint must be the family's image endpoint
assert len(fal_calls) == 1
endpoint = fal_calls[0]["endpoint"]
assert endpoint == FAL_FAMILIES[family_id]["image_endpoint"]
# Payload must contain the right image key (may be image_url or
# start_image_url depending on the family's image_param_key)
payload = fal_calls[0]["arguments"] or {}
expected_image_key = FAL_FAMILIES[family_id].get("image_param_key") or "image_url"
assert payload.get(expected_image_key) == "https://example.com/dog.png", (
f"{family_id} text+image missing {expected_image_key} in payload "
f"(keys: {sorted(payload.keys())})"
)
# ─────────────────────────────────────────────────────────────────────────
# xAI: text-only / text+image both go to /videos/generations
# (xAI uses one endpoint with an optional 'image' field, not separate URLs)
# ─────────────────────────────────────────────────────────────────────────
def test_xai_text_only_via_tool_surface(matrix_env):
home, _, xai_calls = matrix_env
result = _invoke_tool(
home,
{"video_gen": {"provider": "xai"}},
{"prompt": "a dog running"},
)
assert result["success"] is True
assert result["modality"] == "text"
assert result["provider"] == "xai"
assert len(xai_calls) == 1
assert xai_calls[0]["url"].endswith("/videos/generations")
payload = xai_calls[0]["json"] or {}
assert "image" not in payload
assert "reference_images" not in payload
def test_xai_text_plus_image_via_tool_surface(matrix_env):
home, _, xai_calls = matrix_env
result = _invoke_tool(
home,
{"video_gen": {"provider": "xai"}},
{"prompt": "animate this", "image_url": "https://example.com/img.png"},
)
assert result["success"] is True
assert result["modality"] == "image"
assert result["provider"] == "xai"
assert len(xai_calls) == 1
assert xai_calls[0]["url"].endswith("/videos/generations")
payload = xai_calls[0]["json"] or {}
assert payload["image"] == {"url": "https://example.com/img.png"}
# ─────────────────────────────────────────────────────────────────────────
# tool-level `model` arg overrides config
# ─────────────────────────────────────────────────────────────────────────
def test_tool_model_arg_overrides_config(matrix_env):
"""When the tool call passes model=, it wins over video_gen.model in config."""
home, fal_calls, _ = matrix_env
# Config picks pixverse-v6, but tool call says veo3.1
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": "pixverse-v6"}},
{"prompt": "a dog", "model": "veo3.1"},
)
assert result["success"] is True
assert result["model"] == "veo3.1"
# Outbound endpoint reflects the override, not config
assert fal_calls[0]["endpoint"] == "fal-ai/veo3.1"
def test_tool_model_arg_with_image_url_routes_to_override_image_endpoint(matrix_env):
"""model= override on text+image goes to the override family's image endpoint."""
home, fal_calls, _ = matrix_env
result = _invoke_tool(
home,
{"video_gen": {"provider": "fal", "model": "pixverse-v6"}},
{
"prompt": "animate this",
"image_url": "https://example.com/i.png",
"model": "kling-v3-4k",
},
)
assert result["success"] is True
assert result["model"] == "kling-v3-4k"
assert fal_calls[0]["endpoint"] == "fal-ai/kling-video/v3/4k/image-to-video"
# Kling 4K uses start_image_url
assert fal_calls[0]["arguments"].get("start_image_url") == "https://example.com/i.png"
assert "image_url" not in fal_calls[0]["arguments"]