mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
* ci(tests): install ripgrep from prebuilt tarball instead of apt
apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.
- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.
* fix(cli): compile syntax check to tempdir, not source __pycache__
`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:
1. Parallel test workers walking the same source tree (e.g. running
the suite under per-file process isolation) can race against each
other on the `__pycache__` write — manifests as flaky 'directory
not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
that the next interpreter run might pick up — fine when the
interpreter version matches, sketchy if it doesn't.
Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.
* test(runner): per-file process isolation, drop manual state reset + xdist
Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.
Key changes:
* scripts/run_tests_parallel.py — new runner: discovers test files,
runs N in parallel via ThreadPoolExecutor, captures stdout per file,
treats exit code 5 (no tests collected) as pass, kills all children
on exit. Change from cpu_count to cpu_count*2. The runner is
I/O-bound (waiting on subprocess.communicate() from pytest children)
The parent process does almost no CPU work, so 2x oversubscription
keeps more pipes full. When a file fails, immediately show the last
30 lines of pytest output (stack traces + FAILED summary) plus a
ready-to-copy repro command:
python -m pytest tests/agent/test_auxiliary_client.py
* scripts/run_tests.sh — delegates to run_tests_parallel.py
* .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
* pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
* tests/conftest.py — remove ~200 lines of manual state-reset fixtures
* AGENTS.md — update Testing section for per-file design
* test(runner): speed gateway test antipattern scan up
* fix(test): web search provider plugin test missing xai
* fix(tests): make 14 test files pass under per-file subprocess isolation
Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:
Tool registry not populated:
- test_video_generation_tool_surface_matrix: add discover_builtin_tools()
- test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
registering all 8 bundled web providers, reset after each test
- test_website_policy: same provider registration pattern
- test_web_tools_tavily: same pattern across 3 dispatch test classes
- Also add is_safe_url/check_website_access mocks where SSRF check
blocks example.com (DNS resolution fails in isolated envs)
Stale check_fn cache:
- test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
in both kanban guidance tests (prior test cached False for kanban_show)
- test_discord_tool: cache invalidation in setup/teardown
- test_homeassistant_tool: invalidate_check_fn_cache() before registry queries
Module-level state pollution:
- test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
- test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
(ContextVar takes precedence over os.environ)
- test_dm_topics: overwrite sys.modules + separate telegram.constants mock
+ force-reimport of gateway.platforms.telegram
- test_terminal_tool_requirements: removed duplicate class declaration,
autouse _clear_caches fixture
* change(tests): run_tests.sh explicitly includes env vars
instead of manually dropping some vars, now we just only include some
* fix(tests): 5 more isolation/NixOS fixes
- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
(feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
/etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
(doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
profile deletion when copytree preserved read-only modes from
nix store
* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client
* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor
* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test
* fix: address PR #29016 review feedback
- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
shared helper (register_all_web_providers), replacing 8 copy-pasted
blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
fix worker count to match code (cpu_count*2)
* fix: eliminate race in stale-cache achievements test
The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
255 lines
10 KiB
Python
255 lines
10 KiB
Python
"""Tool-surface routing matrix: every (provider, model, modality) combo.
|
||
|
||
This is the integration test for the question Teknium asked: regardless
|
||
of which provider+model the user picks and whether they pass an
|
||
image_url or not, does the tool surface route correctly to the right
|
||
endpoint with the right payload shape?
|
||
|
||
Drives ``_handle_video_generate(args)`` end-to-end — config write →
|
||
config read → registry lookup → provider.generate() → outbound HTTP/SDK
|
||
call. Stubs fal_client and httpx so we observe routing without hitting
|
||
the network.
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import asyncio
|
||
import json
|
||
import types
|
||
from typing import Any, Dict, List, Optional
|
||
|
||
import pytest
|
||
import yaml
|
||
|
||
|
||
@pytest.fixture(autouse=True)
|
||
def _reset_registry():
|
||
from agent import video_gen_registry
|
||
video_gen_registry._reset_for_tests()
|
||
yield
|
||
video_gen_registry._reset_for_tests()
|
||
|
||
|
||
@pytest.fixture
|
||
def matrix_env(tmp_path, monkeypatch):
|
||
"""Set up HERMES_HOME, stub fal_client + httpx, force plugin discovery."""
|
||
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
|
||
monkeypatch.setenv("FAL_KEY", "test-key")
|
||
monkeypatch.setenv("XAI_API_KEY", "test-key")
|
||
|
||
fal_calls: List[Dict[str, Any]] = []
|
||
xai_calls: List[Dict[str, Any]] = []
|
||
|
||
# fal_client stub
|
||
fake_fal = types.ModuleType("fal_client")
|
||
def _subscribe(endpoint, arguments=None, with_logs=False):
|
||
fal_calls.append({"endpoint": endpoint, "arguments": arguments})
|
||
return {"video": {"url": f"https://fake-fal/{endpoint.replace('/','_')}.mp4"}}
|
||
fake_fal.subscribe = _subscribe # type: ignore
|
||
monkeypatch.setitem(__import__("sys").modules, "fal_client", fake_fal)
|
||
|
||
# httpx stub for xAI
|
||
import httpx
|
||
class _Resp:
|
||
def __init__(self, p, s=200):
|
||
self.status_code = s
|
||
self._p = p
|
||
self.text = json.dumps(p)
|
||
def raise_for_status(self):
|
||
if self.status_code >= 400:
|
||
raise httpx.HTTPStatusError("err", request=None, response=self) # type: ignore
|
||
def json(self):
|
||
return self._p
|
||
class _Client:
|
||
async def __aenter__(self): return self
|
||
async def __aexit__(self, *a): return None
|
||
async def post(self, url, headers=None, json=None, timeout=None):
|
||
xai_calls.append({"url": url, "json": json})
|
||
return _Resp({"request_id": "req-1"})
|
||
async def get(self, url, headers=None, timeout=None):
|
||
return _Resp({
|
||
"status": "done",
|
||
"video": {"url": "https://xai-cdn/out.mp4", "duration": 8},
|
||
"model": "grok-imagine-video",
|
||
})
|
||
import plugins.video_gen.xai as xai_plugin
|
||
monkeypatch.setattr(xai_plugin.httpx, "AsyncClient", lambda: _Client())
|
||
async def _no_sleep(*a, **k): return None
|
||
monkeypatch.setattr(asyncio, "sleep", _no_sleep)
|
||
|
||
# Reset FAL plugin's lazy fal_client cache so it picks up the stub
|
||
from plugins.video_gen import fal as fal_plugin
|
||
fal_plugin._fal_client = None
|
||
|
||
# Force discovery
|
||
from hermes_cli.plugins import _ensure_plugins_discovered
|
||
_ensure_plugins_discovered(force=True)
|
||
|
||
return tmp_path, fal_calls, xai_calls
|
||
|
||
|
||
def _invoke_tool(home, cfg: dict, args: dict) -> dict:
|
||
"""Write config, invoke the registered tool handler, return parsed JSON."""
|
||
(home / "config.yaml").write_text(yaml.safe_dump(cfg))
|
||
import hermes_cli.config as cfg_mod
|
||
if hasattr(cfg_mod, "_invalidate_load_config_cache"):
|
||
cfg_mod._invalidate_load_config_cache()
|
||
|
||
from tools.registry import discover_builtin_tools, registry
|
||
if "video_generate" not in registry._tools:
|
||
discover_builtin_tools()
|
||
handler = registry._tools["video_generate"].handler
|
||
return json.loads(handler(args))
|
||
|
||
|
||
# ─────────────────────────────────────────────────────────────────────────
|
||
# FAL: every family × {text-only, text+image}
|
||
# ─────────────────────────────────────────────────────────────────────────
|
||
|
||
# We parametrize over the catalog so the test discovers new families
|
||
# automatically. If someone adds 'sora-2' to FAL_FAMILIES, this matrix
|
||
# picks it up — no test changes needed beyond confirming the endpoints.
|
||
def _all_fal_families():
|
||
from plugins.video_gen.fal import FAL_FAMILIES
|
||
return list(FAL_FAMILIES.keys())
|
||
|
||
|
||
@pytest.mark.parametrize("family_id", _all_fal_families())
|
||
def test_fal_text_only_routes_to_text_endpoint(matrix_env, family_id):
|
||
home, fal_calls, _ = matrix_env
|
||
from plugins.video_gen.fal import FAL_FAMILIES
|
||
|
||
result = _invoke_tool(
|
||
home,
|
||
{"video_gen": {"provider": "fal", "model": family_id}},
|
||
{"prompt": "a dog running"},
|
||
)
|
||
|
||
assert result["success"] is True, f"{family_id}: {result.get('error')}"
|
||
assert result["modality"] == "text"
|
||
assert result["provider"] == "fal"
|
||
|
||
# Outbound endpoint must be the family's text endpoint
|
||
assert len(fal_calls) == 1
|
||
endpoint = fal_calls[0]["endpoint"]
|
||
assert endpoint == FAL_FAMILIES[family_id]["text_endpoint"]
|
||
|
||
# Payload must NOT contain any image-shaped key
|
||
payload = fal_calls[0]["arguments"] or {}
|
||
image_keys = [k for k in payload if "image" in k and "url" in k]
|
||
assert not image_keys, f"{family_id} text-only leaked image keys: {image_keys}"
|
||
|
||
|
||
@pytest.mark.parametrize("family_id", _all_fal_families())
|
||
def test_fal_text_plus_image_routes_to_image_endpoint(matrix_env, family_id):
|
||
home, fal_calls, _ = matrix_env
|
||
from plugins.video_gen.fal import FAL_FAMILIES
|
||
|
||
result = _invoke_tool(
|
||
home,
|
||
{"video_gen": {"provider": "fal", "model": family_id}},
|
||
{"prompt": "animate this dog", "image_url": "https://example.com/dog.png"},
|
||
)
|
||
|
||
assert result["success"] is True, f"{family_id}: {result.get('error')}"
|
||
assert result["modality"] == "image"
|
||
assert result["provider"] == "fal"
|
||
|
||
# Outbound endpoint must be the family's image endpoint
|
||
assert len(fal_calls) == 1
|
||
endpoint = fal_calls[0]["endpoint"]
|
||
assert endpoint == FAL_FAMILIES[family_id]["image_endpoint"]
|
||
|
||
# Payload must contain the right image key (may be image_url or
|
||
# start_image_url depending on the family's image_param_key)
|
||
payload = fal_calls[0]["arguments"] or {}
|
||
expected_image_key = FAL_FAMILIES[family_id].get("image_param_key") or "image_url"
|
||
assert payload.get(expected_image_key) == "https://example.com/dog.png", (
|
||
f"{family_id} text+image missing {expected_image_key} in payload "
|
||
f"(keys: {sorted(payload.keys())})"
|
||
)
|
||
|
||
|
||
# ─────────────────────────────────────────────────────────────────────────
|
||
# xAI: text-only / text+image both go to /videos/generations
|
||
# (xAI uses one endpoint with an optional 'image' field, not separate URLs)
|
||
# ─────────────────────────────────────────────────────────────────────────
|
||
|
||
def test_xai_text_only_via_tool_surface(matrix_env):
|
||
home, _, xai_calls = matrix_env
|
||
|
||
result = _invoke_tool(
|
||
home,
|
||
{"video_gen": {"provider": "xai"}},
|
||
{"prompt": "a dog running"},
|
||
)
|
||
assert result["success"] is True
|
||
assert result["modality"] == "text"
|
||
assert result["provider"] == "xai"
|
||
|
||
assert len(xai_calls) == 1
|
||
assert xai_calls[0]["url"].endswith("/videos/generations")
|
||
payload = xai_calls[0]["json"] or {}
|
||
assert "image" not in payload
|
||
assert "reference_images" not in payload
|
||
|
||
|
||
def test_xai_text_plus_image_via_tool_surface(matrix_env):
|
||
home, _, xai_calls = matrix_env
|
||
|
||
result = _invoke_tool(
|
||
home,
|
||
{"video_gen": {"provider": "xai"}},
|
||
{"prompt": "animate this", "image_url": "https://example.com/img.png"},
|
||
)
|
||
assert result["success"] is True
|
||
assert result["modality"] == "image"
|
||
assert result["provider"] == "xai"
|
||
|
||
assert len(xai_calls) == 1
|
||
assert xai_calls[0]["url"].endswith("/videos/generations")
|
||
payload = xai_calls[0]["json"] or {}
|
||
assert payload["image"] == {"url": "https://example.com/img.png"}
|
||
|
||
|
||
# ─────────────────────────────────────────────────────────────────────────
|
||
# tool-level `model` arg overrides config
|
||
# ─────────────────────────────────────────────────────────────────────────
|
||
|
||
def test_tool_model_arg_overrides_config(matrix_env):
|
||
"""When the tool call passes model=, it wins over video_gen.model in config."""
|
||
home, fal_calls, _ = matrix_env
|
||
|
||
# Config picks pixverse-v6, but tool call says veo3.1
|
||
result = _invoke_tool(
|
||
home,
|
||
{"video_gen": {"provider": "fal", "model": "pixverse-v6"}},
|
||
{"prompt": "a dog", "model": "veo3.1"},
|
||
)
|
||
|
||
assert result["success"] is True
|
||
assert result["model"] == "veo3.1"
|
||
# Outbound endpoint reflects the override, not config
|
||
assert fal_calls[0]["endpoint"] == "fal-ai/veo3.1"
|
||
|
||
|
||
def test_tool_model_arg_with_image_url_routes_to_override_image_endpoint(matrix_env):
|
||
"""model= override on text+image goes to the override family's image endpoint."""
|
||
home, fal_calls, _ = matrix_env
|
||
|
||
result = _invoke_tool(
|
||
home,
|
||
{"video_gen": {"provider": "fal", "model": "pixverse-v6"}},
|
||
{
|
||
"prompt": "animate this",
|
||
"image_url": "https://example.com/i.png",
|
||
"model": "kling-v3-4k",
|
||
},
|
||
)
|
||
|
||
assert result["success"] is True
|
||
assert result["model"] == "kling-v3-4k"
|
||
assert fal_calls[0]["endpoint"] == "fal-ai/kling-video/v3/4k/image-to-video"
|
||
# Kling 4K uses start_image_url
|
||
assert fal_calls[0]["arguments"].get("start_image_url") == "https://example.com/i.png"
|
||
assert "image_url" not in fal_calls[0]["arguments"]
|