hermes-agent/tests/gateway/conftest.py
ethernet 48be2e0e4d
test: use subprocesses for each test file (#29016)
* ci(tests): install ripgrep from prebuilt tarball instead of apt

apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.

- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
  published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
  every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.

* fix(cli): compile syntax check to tempdir, not source __pycache__

`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:

1. Parallel test workers walking the same source tree (e.g. running
   the suite under per-file process isolation) can race against each
   other on the `__pycache__` write — manifests as flaky 'directory
   not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
   that the next interpreter run might pick up — fine when the
   interpreter version matches, sketchy if it doesn't.

Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.

* test(runner): per-file process isolation, drop manual state reset + xdist

Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.

Key changes:
  * scripts/run_tests_parallel.py — new runner: discovers test files,
    runs N in parallel via ThreadPoolExecutor, captures stdout per file,
    treats exit code 5 (no tests collected) as pass, kills all children
    on exit. Change from cpu_count to cpu_count*2. The runner is
    I/O-bound (waiting on subprocess.communicate() from pytest children)
    The parent process does almost no CPU work, so 2x oversubscription
    keeps more pipes full. When a file fails, immediately show the last
    30 lines of pytest output (stack traces + FAILED summary) plus a
    ready-to-copy repro command:
      python -m pytest tests/agent/test_auxiliary_client.py
  * scripts/run_tests.sh — delegates to run_tests_parallel.py
  * .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
  * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
  * tests/conftest.py — remove ~200 lines of manual state-reset fixtures
  * AGENTS.md — update Testing section for per-file design

* test(runner): speed gateway test antipattern scan up

* fix(test): web search provider plugin test missing xai

* fix(tests): make 14 test files pass under per-file subprocess isolation

Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:

Tool registry not populated:
  - test_video_generation_tool_surface_matrix: add discover_builtin_tools()
  - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
    registering all 8 bundled web providers, reset after each test
  - test_website_policy: same provider registration pattern
  - test_web_tools_tavily: same pattern across 3 dispatch test classes
  - Also add is_safe_url/check_website_access mocks where SSRF check
    blocks example.com (DNS resolution fails in isolated envs)

Stale check_fn cache:
  - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
    in both kanban guidance tests (prior test cached False for kanban_show)
  - test_discord_tool: cache invalidation in setup/teardown
  - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries

Module-level state pollution:
  - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
  - test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
    (ContextVar takes precedence over os.environ)
  - test_dm_topics: overwrite sys.modules + separate telegram.constants mock
    + force-reimport of gateway.platforms.telegram
  - test_terminal_tool_requirements: removed duplicate class declaration,
    autouse _clear_caches fixture

* change(tests): run_tests.sh explicitly includes env vars

instead of manually dropping some vars, now we just only include some

* fix(tests): 5 more isolation/NixOS fixes

- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
  command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
  (feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
  /etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
  (doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
  profile deletion when copytree preserved read-only modes from
  nix store

* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client

* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor

* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test

* fix: address PR #29016 review feedback

- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
  shared helper (register_all_web_providers), replacing 8 copy-pasted
  blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
  fix worker count to match code (cpu_count*2)

* fix: eliminate race in stale-cache achievements test

The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
2026-05-21 16:40:04 +05:30

452 lines
17 KiB
Python

"""Shared fixtures for gateway tests.
The ``_ensure_telegram_mock`` helper guarantees that a minimal mock of
the ``telegram`` package is registered in :data:`sys.modules` **before**
any test file triggers ``from gateway.platforms.telegram import ...``.
Without this, ``pytest-xdist`` workers that happen to collect
``test_telegram_caption_merge.py`` (bare top-level import, no per-file
mock) first will cache ``ChatType = None`` from the production
ImportError fallback, causing 30+ downstream test failures wherever
``ChatType.GROUP`` / ``ChatType.SUPERGROUP`` is accessed.
Individual test files may still call their own ``_ensure_telegram_mock``
— it short-circuits when the mock is already present.
Plugin-adapter anti-pattern guard
---------------------------------
Tests for platform plugins (``plugins/platforms/<name>/adapter.py``)
must load the adapter via
:func:`tests.gateway._plugin_adapter_loader.load_plugin_adapter`, not by
adding the plugin directory to ``sys.path`` and doing a bare
``from adapter import ...``. The guard at the bottom of this file
scans test module ASTs at collection time and fails collection with a
pointer to the helper if the anti-pattern is detected.
Rationale: every plugin ships its own ``adapter.py``, and two tests each
inserting their plugin dir on ``sys.path[0]`` race for
``sys.modules["adapter"]`` in the same xdist worker. Whichever collects
first wins; the other fails with ``ImportError``, and the polluted
``sys.path`` cascades into unrelated tests. See PR #17764 for the
incident.
"""
import ast
import sys
from pathlib import Path
from unittest.mock import MagicMock
import pytest
def _ensure_telegram_mock() -> None:
"""Install a comprehensive telegram mock in sys.modules.
Idempotent — skips when the real library is already imported.
Uses ``sys.modules[name] = mod`` (overwrite) instead of
``setdefault`` so it wins even if a partial/broken import
already cached a module with ``ChatType = None``.
"""
if "telegram" in sys.modules and hasattr(sys.modules["telegram"], "__file__"):
return # Real library is installed — nothing to mock
mod = MagicMock()
mod.ext.ContextTypes.DEFAULT_TYPE = type(None)
mod.constants.ParseMode.MARKDOWN = "Markdown"
mod.constants.ParseMode.MARKDOWN_V2 = "MarkdownV2"
mod.constants.ParseMode.HTML = "HTML"
mod.constants.ChatType.PRIVATE = "private"
mod.constants.ChatType.GROUP = "group"
mod.constants.ChatType.SUPERGROUP = "supergroup"
mod.constants.ChatType.CHANNEL = "channel"
# Real exception classes so ``except (NetworkError, ...)`` clauses
# in production code don't blow up with TypeError.
mod.error.NetworkError = type("NetworkError", (OSError,), {})
mod.error.TimedOut = type("TimedOut", (OSError,), {})
mod.error.BadRequest = type("BadRequest", (Exception,), {})
mod.error.Forbidden = type("Forbidden", (Exception,), {})
mod.error.InvalidToken = type("InvalidToken", (Exception,), {})
mod.error.RetryAfter = type("RetryAfter", (Exception,), {"retry_after": 1})
mod.error.Conflict = type("Conflict", (Exception,), {})
# Update.ALL_TYPES used in start_polling()
mod.Update.ALL_TYPES = []
for name in (
"telegram",
"telegram.ext",
"telegram.constants",
"telegram.request",
):
sys.modules[name] = mod
sys.modules["telegram.error"] = mod.error
def _ensure_discord_mock() -> None:
"""Install a comprehensive discord mock in sys.modules.
Idempotent — skips when the real library is already imported.
Uses ``sys.modules[name] = mod`` (overwrite) instead of
``setdefault`` so it wins even if a partial/broken import already
cached the module.
This mock is comprehensive — it includes **all** attributes needed by
every gateway discord test file. Individual test files should call
this function (it short-circuits when already present) rather than
maintaining their own mock setup.
"""
if "discord" in sys.modules and hasattr(sys.modules["discord"], "__file__"):
return # Real library is installed — nothing to mock
from types import SimpleNamespace
discord_mod = MagicMock()
discord_mod.Intents.default.return_value = MagicMock()
discord_mod.Client = MagicMock
discord_mod.File = MagicMock
discord_mod.DMChannel = type("DMChannel", (), {})
discord_mod.Thread = type("Thread", (), {})
discord_mod.ForumChannel = type("ForumChannel", (), {})
discord_mod.Interaction = object
discord_mod.Message = type("Message", (), {})
# Embed: accept the kwargs production code / tests use
# (title, description, color). MagicMock auto-attributes work too,
# but some tests construct and inspect .title/.description directly.
class _FakeEmbed:
def __init__(self, *, title=None, description=None, color=None, **_):
self.title = title
self.description = description
self.color = color
self.fields = []
self.footer = None
def add_field(self, *, name=None, value=None, inline=False, **_):
self.fields.append({"name": name, "value": value, "inline": inline})
return self
def set_footer(self, *, text=None, icon_url=None, **_):
self.footer = {"text": text, "icon_url": icon_url}
return self
discord_mod.Embed = _FakeEmbed
# ui.View / ui.Select / ui.Button: real classes (not MagicMock) so
# tests that subclass ModelPickerView / iterate .children / clear
# items work.
class _FakeView:
def __init__(self, timeout=None):
self.timeout = timeout
self.children = []
def add_item(self, item):
self.children.append(item)
def clear_items(self):
self.children.clear()
class _FakeSelect:
def __init__(self, *, placeholder=None, options=None, custom_id=None, **_):
self.placeholder = placeholder
self.options = options or []
self.custom_id = custom_id
self.callback = None
self.disabled = False
class _FakeButton:
def __init__(self, *, label=None, style=None, custom_id=None, emoji=None,
url=None, disabled=False, row=None, sku_id=None, **_):
self.label = label
self.style = style
self.custom_id = custom_id
self.emoji = emoji
self.url = url
self.disabled = disabled
self.row = row
self.sku_id = sku_id
self.callback = None
class _FakeSelectOption:
def __init__(self, *, label=None, value=None, description=None, **_):
self.label = label
self.value = value
self.description = description
discord_mod.SelectOption = _FakeSelectOption
discord_mod.ui = SimpleNamespace(
View=_FakeView,
Select=_FakeSelect,
Button=_FakeButton,
button=lambda *a, **k: (lambda fn: fn),
)
discord_mod.ButtonStyle = SimpleNamespace(
success=1, primary=2, secondary=2, danger=3,
green=1, grey=2, blurple=2, red=3,
)
discord_mod.Color = SimpleNamespace(
orange=lambda: 1, green=lambda: 2, blue=lambda: 3,
red=lambda: 4, purple=lambda: 5, greyple=lambda: 6,
)
# app_commands — needed by _register_slash_commands auto-registration
class _FakeGroup:
def __init__(self, *, name, description, parent=None):
self.name = name
self.description = description
self.parent = parent
self._children: dict = {}
if parent is not None:
parent.add_command(self)
def add_command(self, cmd):
self._children[cmd.name] = cmd
class _FakeCommand:
def __init__(self, *, name, description, callback, parent=None):
self.name = name
self.description = description
self.callback = callback
self.parent = parent
discord_mod.app_commands = SimpleNamespace(
describe=lambda **kwargs: (lambda fn: fn),
choices=lambda **kwargs: (lambda fn: fn),
Choice=lambda **kwargs: SimpleNamespace(**kwargs),
Group=_FakeGroup,
Command=_FakeCommand,
)
ext_mod = MagicMock()
commands_mod = MagicMock()
commands_mod.Bot = MagicMock
ext_mod.commands = commands_mod
for name in ("discord", "discord.ext", "discord.ext.commands"):
sys.modules[name] = discord_mod
sys.modules["discord.ext"] = ext_mod
sys.modules["discord.ext.commands"] = commands_mod
# Run at collection time — before any test file's module-level imports.
_ensure_telegram_mock()
_ensure_discord_mock()
# ---------------------------------------------------------------------------
# Plugin-adapter anti-pattern guard
# ---------------------------------------------------------------------------
_GATEWAY_DIR = Path(__file__).resolve().parent
_GUARD_HINT = (
"Plugin adapter tests must use "
"``from tests.gateway._plugin_adapter_loader import load_plugin_adapter`` "
"and call ``load_plugin_adapter('<plugin_name>')`` instead of inserting "
"``plugins/platforms/<name>/`` on sys.path and doing a bare ``import "
"adapter`` / ``from adapter import ...``. See the 'Plugin-adapter "
"anti-pattern guard' docstring in tests/gateway/conftest.py."
)
def _scan_for_plugin_adapter_antipattern(source: str) -> list[str]:
"""Return a list of offending-line descriptions, or [] if clean.
Flags two things:
1. ``sys.path.insert(..., <something mentioning 'plugins/platforms'>)``
2. ``import adapter`` or ``from adapter import ...`` at module level.
"""
try:
tree = ast.parse(source)
except SyntaxError:
return [] # Let pytest surface the real syntax error.
offenses: list[str] = []
for node in ast.walk(tree):
# sys.path.insert(0, ".../plugins/platforms/...")
if isinstance(node, ast.Call):
func = node.func
target_name: str | None = None
if isinstance(func, ast.Attribute):
# sys.path.insert / sys.path.append
if (
isinstance(func.value, ast.Attribute)
and isinstance(func.value.value, ast.Name)
and func.value.value.id == "sys"
and func.value.attr == "path"
and func.attr in {"insert", "append", "extend"}
):
target_name = f"sys.path.{func.attr}"
if target_name is not None:
call_src = ast.unparse(node)
# Match both the string-literal form
# ``.../plugins/platforms/...`` and the Path-operator form
# ``Path(...) / 'plugins' / 'platforms' / ...`` that
# plugin tests typically use.
_src_no_ws = "".join(call_src.split())
if (
"plugins/platforms" in call_src
or "plugins\\platforms" in call_src
or "'plugins'/'platforms'" in _src_no_ws
or '"plugins"/"platforms"' in _src_no_ws
):
offenses.append(
f"line {node.lineno}: {target_name}(...) points into "
f"plugins/platforms/"
)
# Bare `import adapter` / `from adapter import ...` anywhere (module level
# OR inside functions — both are symptoms of the same pattern).
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
if alias.name == "adapter":
offenses.append(
f"line {node.lineno}: ``import adapter`` "
f"(bare — resolves to whichever plugin's adapter.py "
f"is first on sys.path)"
)
elif isinstance(node, ast.ImportFrom):
if node.module == "adapter" and node.level == 0:
offenses.append(
f"line {node.lineno}: ``from adapter import ...`` "
f"(bare — resolves to whichever plugin's adapter.py "
f"is first on sys.path)"
)
return offenses
def _fingerprint_gateway_tests() -> str:
"""Return a short fingerprint that changes when any gateway test file changes.
Uses (mtime, size) pairs instead of content hashing — fast to compute
(stat-only, no reads) and sufficient for cache invalidation across
per-file subprocess runs.
"""
import hashlib
h = hashlib.sha256()
for path in sorted(_GATEWAY_DIR.rglob("test_*.py")):
try:
st = path.stat()
h.update(f"{path.name}:{st.st_mtime_ns}:{st.st_size}".encode())
except OSError:
h.update(f"{path.name}:missing".encode())
return h.hexdigest()[:16]
def _run_adapter_antipattern_scan() -> list[str]:
"""Scan gateway test files for the plugin-adapter anti-pattern.
Returns a list of violation strings (empty if clean).
"""
violations: list[str] = []
for path in _GATEWAY_DIR.rglob("test_*.py"):
if path.name in {"_plugin_adapter_loader.py", "conftest.py"}:
continue
try:
source = path.read_text(encoding="utf-8")
except OSError:
continue
# Fast string pre-filter: skip files that can't possibly violate.
# A violating file MUST contain both (a) an adapter/plugins/platforms
# reference AND (b) either sys.path manipulation or a bare adapter import.
if "adapter" not in source and "plugins/platforms" not in source:
continue
if not (
"sys.path" in source
or "import adapter" in source
or "from adapter import" in source
):
continue
offenses = _scan_for_plugin_adapter_antipattern(source)
if offenses:
violations.append(
f" {path.relative_to(_GATEWAY_DIR.parent.parent)}:\n "
+ "\n ".join(offenses)
)
return violations
def pytest_configure(config):
"""Reject plugin-adapter tests that use the sys.path anti-pattern.
Runs once per pytest session on the controller, BEFORE any xdist
worker is spawned. If any file under ``tests/gateway/`` matches the
anti-pattern, we fail the whole session with a clear message —
before a polluted ``sys.path`` can cascade across workers.
**Performance**: in the per-file subprocess isolation model (no xdist),
every subprocess is a "controller" — so the naive scan would run 257
times, each costing ~1s of AST walking. We avoid this with two
strategies:
1. **Tight string pre-filter**: a file can only violate if it contains
*both* an adapter/plugins/platforms reference *and* a sys.path
manipulation or bare ``import adapter``. This drops ~95% of files
from needing AST parsing.
2. **File-locked cache**: the scan result is cached in
``.pytest-cache/gw-adapter-guard-<fingerprint>`` keyed on a
fingerprint of the gateway test file mtimes/sizes. Concurrent
subprocesses acquire a lock; only the first performs the scan;
the rest wait and read the cached result.
"""
# Only run on the xdist controller (or in non-xdist runs). Skip on
# worker subprocesses so we don't scan the filesystem N times.
if hasattr(config, "workerinput"):
return
fp = _fingerprint_gateway_tests()
cache_dir = Path.cwd() / ".pytest-cache"
cache_file = cache_dir / f"gw-adapter-guard-{fp}"
lock_file = cache_dir / f".gw-adapter-guard-{fp}.lock"
cache_dir.mkdir(parents=True, exist_ok=True)
# Evict stale cache entries from previous fingerprints (best-effort).
try:
for old in cache_dir.glob("gw-adapter-guard-*"):
if old.name != f"gw-adapter-guard-{fp}":
old.unlink(missing_ok=True)
for old in cache_dir.glob(".gw-adapter-guard-*.lock"):
if old.name != f".gw-adapter-guard-{fp}.lock":
old.unlink(missing_ok=True)
except OSError:
pass # Non-critical; old files are harmless.
# Use filelock to ensure only one process scans at a time.
# Concurrent subprocesses all hit pytest_configure simultaneously;
# without a lock they'd all find no cache and all run the scan.
try:
from filelock import FileLock
lock = FileLock(str(lock_file), timeout=120)
except ImportError:
# Fallback: no locking (still correct, just slower under contention).
import contextlib
class _NoLock:
def __enter__(self):
return self
def __exit__(self, *a):
pass
lock = _NoLock()
with lock:
if cache_file.exists():
cached = cache_file.read_text(encoding="utf-8")
if cached == "clean":
return
raise pytest.UsageError(cached)
# Slow path: this process is the first to acquire the lock.
violations = _run_adapter_antipattern_scan()
if violations:
msg = (
"Plugin-adapter-import anti-pattern detected in gateway tests:\n"
+ "\n".join(violations)
+ "\n\n"
+ _GUARD_HINT
)
cache_file.write_text(msg, encoding="utf-8")
raise pytest.UsageError(msg)
else:
cache_file.write_text("clean", encoding="utf-8")