hermes-agent/tests/conftest.py
ethernet 48be2e0e4d
test: use subprocesses for each test file (#29016)
* ci(tests): install ripgrep from prebuilt tarball instead of apt

apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.

- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
  published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
  every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.

* fix(cli): compile syntax check to tempdir, not source __pycache__

`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:

1. Parallel test workers walking the same source tree (e.g. running
   the suite under per-file process isolation) can race against each
   other on the `__pycache__` write — manifests as flaky 'directory
   not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
   that the next interpreter run might pick up — fine when the
   interpreter version matches, sketchy if it doesn't.

Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.

* test(runner): per-file process isolation, drop manual state reset + xdist

Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.

Key changes:
  * scripts/run_tests_parallel.py — new runner: discovers test files,
    runs N in parallel via ThreadPoolExecutor, captures stdout per file,
    treats exit code 5 (no tests collected) as pass, kills all children
    on exit. Change from cpu_count to cpu_count*2. The runner is
    I/O-bound (waiting on subprocess.communicate() from pytest children)
    The parent process does almost no CPU work, so 2x oversubscription
    keeps more pipes full. When a file fails, immediately show the last
    30 lines of pytest output (stack traces + FAILED summary) plus a
    ready-to-copy repro command:
      python -m pytest tests/agent/test_auxiliary_client.py
  * scripts/run_tests.sh — delegates to run_tests_parallel.py
  * .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
  * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
  * tests/conftest.py — remove ~200 lines of manual state-reset fixtures
  * AGENTS.md — update Testing section for per-file design

* test(runner): speed gateway test antipattern scan up

* fix(test): web search provider plugin test missing xai

* fix(tests): make 14 test files pass under per-file subprocess isolation

Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:

Tool registry not populated:
  - test_video_generation_tool_surface_matrix: add discover_builtin_tools()
  - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
    registering all 8 bundled web providers, reset after each test
  - test_website_policy: same provider registration pattern
  - test_web_tools_tavily: same pattern across 3 dispatch test classes
  - Also add is_safe_url/check_website_access mocks where SSRF check
    blocks example.com (DNS resolution fails in isolated envs)

Stale check_fn cache:
  - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
    in both kanban guidance tests (prior test cached False for kanban_show)
  - test_discord_tool: cache invalidation in setup/teardown
  - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries

Module-level state pollution:
  - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
  - test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
    (ContextVar takes precedence over os.environ)
  - test_dm_topics: overwrite sys.modules + separate telegram.constants mock
    + force-reimport of gateway.platforms.telegram
  - test_terminal_tool_requirements: removed duplicate class declaration,
    autouse _clear_caches fixture

* change(tests): run_tests.sh explicitly includes env vars

instead of manually dropping some vars, now we just only include some

* fix(tests): 5 more isolation/NixOS fixes

- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
  command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
  (feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
  /etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
  (doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
  profile deletion when copytree preserved read-only modes from
  nix store

* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client

* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor

* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test

* fix: address PR #29016 review feedback

- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
  shared helper (register_all_web_providers), replacing 8 copy-pasted
  blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
  fix worker count to match code (cpu_count*2)

* fix: eliminate race in stale-cache achievements test

The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
2026-05-21 16:40:04 +05:30

822 lines
30 KiB
Python

"""Shared fixtures for the hermes-agent test suite.
Hermetic-test invariants enforced here (see AGENTS.md for rationale):
1. **No credential env vars.** All provider/credential-shaped env vars
(ending in _API_KEY, _TOKEN, _SECRET, _PASSWORD, _CREDENTIALS, etc.)
are unset before every test. Local developer keys cannot leak in.
2. **Isolated HERMES_HOME.** HERMES_HOME points to a per-test tempdir so
code reading ``~/.hermes/*`` via ``get_hermes_home()`` can't see the
real one. (We do NOT also redirect HOME — that broke subprocesses in
CI. Code using ``Path.home() / ".hermes"`` instead of the canonical
``get_hermes_home()`` is a bug to fix at the callsite.)
3. **Deterministic runtime.** TZ=UTC, LANG=C.UTF-8, PYTHONHASHSEED=0.
4. **No HERMES_SESSION_* inheritance** — the agent's current gateway
session must not leak into tests.
These invariants make the local test run match CI closely. Gaps that
remain (CPU count, xdist worker count) are addressed by the canonical
test runner at ``scripts/run_tests.sh``.
"""
import asyncio
import os
import re
import sys
from pathlib import Path
from unittest.mock import patch
import pytest
# Ensure project root is importable
PROJECT_ROOT = Path(__file__).parent.parent
if str(PROJECT_ROOT) not in sys.path:
sys.path.insert(0, str(PROJECT_ROOT))
# ── Per-file process isolation ──────────────────────────────────────────────
# Tests run via ``scripts/run_tests_parallel.py``, which spawns a fresh
# ``python -m pytest <file>`` subprocess per test file. Cross-file state
# leakage (module-level dicts, ContextVars, caches) is impossible: each
# file gets a clean Python interpreter. Intra-file ordering is the test
# author's responsibility — if test A in foo.py mutates state that test B
# in foo.py reads, that's a real bug to fix in the file (it would also
# bite anyone running ``pytest tests/foo.py`` directly).
#
# This replaces the historic _reset_module_state autouse fixture (manual
# state clearing) and the brief experiment with subprocess-per-test
# isolation (too slow at ~17k tests).
#
# See ``scripts/run_tests_parallel.py`` for the runner.
# ── Credential env-var filter ──────────────────────────────────────────────
#
# Any env var in the current process matching ONE of these patterns is
# unset for every test. Developers' local keys cannot leak into assertions
# about "auto-detect provider when key present".
_CREDENTIAL_SUFFIXES = (
"_API_KEY",
"_TOKEN",
"_SECRET",
"_PASSWORD",
"_CREDENTIALS",
"_ACCESS_KEY",
"_SECRET_ACCESS_KEY",
"_PRIVATE_KEY",
"_OAUTH_TOKEN",
"_WEBHOOK_SECRET",
"_ENCRYPT_KEY",
"_APP_SECRET",
"_CLIENT_SECRET",
"_CORP_SECRET",
"_AES_KEY",
)
# Explicit names (for ones that don't fit the suffix pattern)
_CREDENTIAL_NAMES = frozenset({
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"ANTHROPIC_TOKEN",
"FAL_KEY",
"GH_TOKEN",
"GITHUB_TOKEN",
"OPENAI_API_KEY",
"OPENROUTER_API_KEY",
"NOUS_API_KEY",
"GEMINI_API_KEY",
"GOOGLE_API_KEY",
"GROQ_API_KEY",
"XAI_API_KEY",
"MISTRAL_API_KEY",
"DEEPSEEK_API_KEY",
"KIMI_API_KEY",
"MOONSHOT_API_KEY",
"GLM_API_KEY",
"ZAI_API_KEY",
"MINIMAX_API_KEY",
"OLLAMA_API_KEY",
"OPENVIKING_API_KEY",
"COPILOT_API_KEY",
"CLAUDE_CODE_OAUTH_TOKEN",
"BROWSERBASE_API_KEY",
"FIRECRAWL_API_KEY",
"PARALLEL_API_KEY",
"EXA_API_KEY",
"TAVILY_API_KEY",
"WANDB_API_KEY",
"ELEVENLABS_API_KEY",
"HONCHO_API_KEY",
"MEM0_API_KEY",
"SUPERMEMORY_API_KEY",
"RETAINDB_API_KEY",
"HINDSIGHT_API_KEY",
"HINDSIGHT_LLM_API_KEY",
"DAYTONA_API_KEY",
"TWILIO_AUTH_TOKEN",
"TELEGRAM_BOT_TOKEN",
"DISCORD_BOT_TOKEN",
"SLACK_BOT_TOKEN",
"SLACK_APP_TOKEN",
"MATTERMOST_TOKEN",
"MATRIX_ACCESS_TOKEN",
"MATRIX_PASSWORD",
"MATRIX_RECOVERY_KEY",
"HASS_TOKEN",
"EMAIL_PASSWORD",
"BLUEBUBBLES_PASSWORD",
"FEISHU_APP_SECRET",
"FEISHU_ENCRYPT_KEY",
"FEISHU_VERIFICATION_TOKEN",
"DINGTALK_CLIENT_SECRET",
"QQ_CLIENT_SECRET",
"QQ_STT_API_KEY",
"WECOM_SECRET",
"WECOM_CALLBACK_CORP_SECRET",
"WECOM_CALLBACK_TOKEN",
"WECOM_CALLBACK_ENCODING_AES_KEY",
"WEIXIN_TOKEN",
"MODAL_TOKEN_ID",
"MODAL_TOKEN_SECRET",
"TERMINAL_SSH_KEY",
"SUDO_PASSWORD",
"GATEWAY_PROXY_KEY",
"API_SERVER_KEY",
"TOOL_GATEWAY_USER_TOKEN",
"TELEGRAM_WEBHOOK_SECRET",
"WEBHOOK_SECRET",
"AI_GATEWAY_API_KEY",
"VOICE_TOOLS_OPENAI_KEY",
"BROWSER_USE_API_KEY",
"CUSTOM_API_KEY",
"GATEWAY_PROXY_URL",
"GEMINI_BASE_URL",
"OPENAI_BASE_URL",
"OPENROUTER_BASE_URL",
"OLLAMA_BASE_URL",
"GROQ_BASE_URL",
"XAI_BASE_URL",
"AI_GATEWAY_BASE_URL",
"ANTHROPIC_BASE_URL",
})
def _looks_like_credential(name: str) -> bool:
"""True if env var name matches a credential-shaped pattern."""
if name in _CREDENTIAL_NAMES:
return True
return any(name.endswith(suf) for suf in _CREDENTIAL_SUFFIXES)
# HERMES_* vars that change test behavior by being set. Unset all of these
# unconditionally — individual tests that need them set do so explicitly.
_HERMES_BEHAVIORAL_VARS = frozenset({
"HERMES_YOLO_MODE",
"HERMES_INTERACTIVE",
"HERMES_QUIET",
"HERMES_TOOL_PROGRESS",
"HERMES_TOOL_PROGRESS_MODE",
"HERMES_MAX_ITERATIONS",
"HERMES_SESSION_PLATFORM",
"HERMES_SESSION_CHAT_ID",
"HERMES_SESSION_CHAT_NAME",
"HERMES_SESSION_THREAD_ID",
"HERMES_SESSION_SOURCE",
"HERMES_SESSION_KEY",
"HERMES_GATEWAY_SESSION",
"HERMES_PLATFORM",
"HERMES_MODEL",
"HERMES_INFERENCE_MODEL",
"HERMES_INFERENCE_PROVIDER",
"HERMES_TUI_PROVIDER",
"HERMES_MANAGED",
"HERMES_DEV",
"HERMES_CONTAINER",
"HERMES_EPHEMERAL_SYSTEM_PROMPT",
"HERMES_TIMEZONE",
"HERMES_REDACT_SECRETS",
"HERMES_BACKGROUND_NOTIFICATIONS",
"HERMES_EXEC_ASK",
"HERMES_HOME_MODE",
"HERMES_AGENT_USE_LEGACY_SESSION_KEYS",
# Kanban path/board pins must never leak from a developer shell or
# dispatched worker into tests; otherwise tests can write fake tasks to
# the real ~/.hermes/kanban.db instead of the per-test HERMES_HOME.
"HERMES_KANBAN_DB",
"HERMES_KANBAN_BOARD",
"HERMES_KANBAN_HOME",
"HERMES_KANBAN_WORKSPACES_ROOT",
"HERMES_KANBAN_LOGS_ROOT",
"HERMES_KANBAN_TASK",
"HERMES_KANBAN_WORKSPACE",
"HERMES_KANBAN_RUN_ID",
"HERMES_KANBAN_CLAIM_LOCK",
"HERMES_KANBAN_DISPATCH_IN_GATEWAY",
"HERMES_TENANT",
"TERMINAL_CWD",
"TERMINAL_ENV",
"TERMINAL_VERCEL_RUNTIME",
"TERMINAL_CONTAINER_CPU",
"TERMINAL_CONTAINER_DISK",
"TERMINAL_CONTAINER_MEMORY",
"TERMINAL_CONTAINER_PERSISTENT",
"TERMINAL_DOCKER_RUN_AS_HOST_USER",
"BROWSER_CDP_URL",
"CAMOFOX_URL",
# Platform allowlists — not credentials, but if set from any source
# (user shell, earlier leaky test, CI env), they change gateway auth
# behavior and flake button-authorization tests.
"TELEGRAM_ALLOWED_USERS",
"DISCORD_ALLOWED_USERS",
"WHATSAPP_ALLOWED_USERS",
"SLACK_ALLOWED_USERS",
"SIGNAL_ALLOWED_USERS",
"SIGNAL_GROUP_ALLOWED_USERS",
"EMAIL_ALLOWED_USERS",
"SMS_ALLOWED_USERS",
"MATTERMOST_ALLOWED_USERS",
"MATRIX_ALLOWED_USERS",
"DINGTALK_ALLOWED_USERS",
"FEISHU_ALLOWED_USERS",
"WECOM_ALLOWED_USERS",
"GATEWAY_ALLOWED_USERS",
"GATEWAY_ALLOW_ALL_USERS",
"TELEGRAM_ALLOW_ALL_USERS",
"DISCORD_ALLOW_ALL_USERS",
"WHATSAPP_ALLOW_ALL_USERS",
"SLACK_ALLOW_ALL_USERS",
"SIGNAL_ALLOW_ALL_USERS",
"EMAIL_ALLOW_ALL_USERS",
"SMS_ALLOW_ALL_USERS",
# Gateway home channels are set by /sethome in real profiles. Tests that
# exercise dashboard notification toggles must opt in explicitly or they
# can accidentally subscribe against a developer's real home channel.
"TELEGRAM_HOME_CHANNEL",
"TELEGRAM_HOME_CHANNEL_THREAD_ID",
"TELEGRAM_HOME_CHANNEL_NAME",
"TELEGRAM_CRON_THREAD_ID",
"DISCORD_HOME_CHANNEL",
"DISCORD_HOME_CHANNEL_THREAD_ID",
"DISCORD_HOME_CHANNEL_NAME",
"SLACK_HOME_CHANNEL",
"SLACK_HOME_CHANNEL_THREAD_ID",
"SLACK_HOME_CHANNEL_NAME",
"WHATSAPP_HOME_CHANNEL",
"WHATSAPP_HOME_CHANNEL_THREAD_ID",
"WHATSAPP_HOME_CHANNEL_NAME",
"SIGNAL_HOME_CHANNEL",
"SIGNAL_HOME_CHANNEL_THREAD_ID",
"SIGNAL_HOME_CHANNEL_NAME",
"EMAIL_HOME_CHANNEL",
"EMAIL_HOME_CHANNEL_THREAD_ID",
"EMAIL_HOME_CHANNEL_NAME",
"SMS_HOME_CHANNEL",
"SMS_HOME_CHANNEL_THREAD_ID",
"SMS_HOME_CHANNEL_NAME",
"MATTERMOST_HOME_CHANNEL",
"MATTERMOST_HOME_CHANNEL_THREAD_ID",
"MATTERMOST_HOME_CHANNEL_NAME",
"MATRIX_HOME_CHANNEL",
"MATRIX_HOME_CHANNEL_THREAD_ID",
"MATRIX_HOME_CHANNEL_NAME",
"DINGTALK_HOME_CHANNEL",
"DINGTALK_HOME_CHANNEL_THREAD_ID",
"DINGTALK_HOME_CHANNEL_NAME",
"FEISHU_HOME_CHANNEL",
"FEISHU_HOME_CHANNEL_THREAD_ID",
"FEISHU_HOME_CHANNEL_NAME",
"WECOM_HOME_CHANNEL",
"WECOM_HOME_CHANNEL_THREAD_ID",
"WECOM_HOME_CHANNEL_NAME",
# Platform gating — set by load_gateway_config() as a side effect when
# a config.yaml is present, so individual test bodies that call the
# loader leak these values into later tests in the same process.
# Force-clear on every test setup so the leak can't happen.
"SLACK_REQUIRE_MENTION",
"SLACK_STRICT_MENTION",
"SLACK_FREE_RESPONSE_CHANNELS",
"SLACK_ALLOW_BOTS",
"SLACK_REACTIONS",
"DISCORD_REQUIRE_MENTION",
"DISCORD_FREE_RESPONSE_CHANNELS",
"TELEGRAM_REQUIRE_MENTION",
"WHATSAPP_REQUIRE_MENTION",
"DINGTALK_REQUIRE_MENTION",
"MATRIX_REQUIRE_MENTION",
})
@pytest.fixture(autouse=True)
def _hermetic_environment(tmp_path, monkeypatch):
"""Blank out all credential/behavioral env vars so local and CI match.
Also redirects HOME and HERMES_HOME to per-test tempdirs so code that
reads ``~/.hermes/*`` can't touch the real one, and pins TZ/LANG so
datetime/locale-sensitive tests are deterministic.
"""
# 1. Blank every credential-shaped env var that's currently set.
for name in list(os.environ.keys()):
if _looks_like_credential(name):
monkeypatch.delenv(name, raising=False)
# 2. Blank behavioral HERMES_* vars that could change test semantics.
for name in _HERMES_BEHAVIORAL_VARS:
monkeypatch.delenv(name, raising=False)
# 3. Redirect HERMES_HOME to a per-test tempdir. Code that reads
# ``~/.hermes/*`` via ``get_hermes_home()`` now gets the tempdir.
#
# NOTE: We do NOT also redirect HOME. Doing so broke CI because
# some tests (and their transitive deps) spawn subprocesses that
# inherit HOME and expect it to be stable. If a test genuinely
# needs HOME isolated, it should set it explicitly in its own
# fixture. Any code in the codebase reading ``~/.hermes/*`` via
# ``Path.home() / ".hermes"`` instead of ``get_hermes_home()``
# is a bug to fix at the callsite.
fake_hermes_home = tmp_path / "hermes_test"
fake_hermes_home.mkdir()
(fake_hermes_home / "sessions").mkdir()
(fake_hermes_home / "cron").mkdir()
(fake_hermes_home / "memories").mkdir()
(fake_hermes_home / "skills").mkdir()
monkeypatch.setenv("HERMES_HOME", str(fake_hermes_home))
# 4. Deterministic locale / timezone / hashseed. CI runs in UTC with
# C.UTF-8 locale; local dev often doesn't. Pin everything.
monkeypatch.setenv("TZ", "UTC")
monkeypatch.setenv("LANG", "C.UTF-8")
monkeypatch.setenv("LC_ALL", "C.UTF-8")
monkeypatch.setenv("PYTHONHASHSEED", "0")
# 4b. Disable AWS IMDS lookups. Without this, any test that ends up
# calling has_aws_credentials() / resolve_aws_auth_env_var()
# (e.g. provider auto-detect, status command, cron run_job) burns
# ~2s waiting for the metadata service at 169.254.169.254 to time
# out. Tests don't run on EC2 — IMDS is always unreachable here.
monkeypatch.setenv("AWS_EC2_METADATA_DISABLED", "true")
monkeypatch.setenv("AWS_METADATA_SERVICE_TIMEOUT", "1")
monkeypatch.setenv("AWS_METADATA_SERVICE_NUM_ATTEMPTS", "1")
# 5. Reset plugin singleton so tests don't leak plugins from
# ~/.hermes/plugins/ (which, per step 3, is now empty — but the
# singleton might still be cached from a previous test).
try:
import hermes_cli.plugins as _plugins_mod
monkeypatch.setattr(_plugins_mod, "_plugin_manager", None)
except Exception:
pass
# Explicitly clear provider-specific base URL overrides that don't match
# the generic credential-shaped env-var filter above.
monkeypatch.delenv("GMI_API_KEY", raising=False)
monkeypatch.delenv("GMI_BASE_URL", raising=False)
# Backward-compat alias — old tests reference this fixture name. Keep it
# as a no-op wrapper so imports don't break.
@pytest.fixture(autouse=True)
def _isolate_hermes_home(_hermetic_environment):
"""Alias preserved for any test that yields this name explicitly."""
return None
# ── Module-level state reset — replaced by per-file process isolation ──────
#
# Each test FILE runs in a freshly-spawned ``python -m pytest <file>``
# subprocess via ``scripts/run_tests_parallel.py``, so module-level dicts /
# sets / ContextVars from tests in one file cannot leak into tests in
# another file. No manual per-module clearing needed.
#
# Within a single file, ordering is the author's responsibility. If your
# tests in the same file share mutable state, either reset it explicitly
# in a fixture or split them across files.
#
# The skill ``test-suite-cascade-diagnosis`` documents the cascade patterns
# this replaces; the running example was ``test_command_guards`` failing
# 12/15 CI runs because ``tools.approval._session_approved`` carried
# approvals from one test's session into another's.
@pytest.fixture()
def tmp_dir(tmp_path):
"""Provide a temporary directory that is cleaned up automatically."""
return tmp_path
@pytest.fixture()
def mock_config():
"""Return a minimal hermes config dict suitable for unit tests."""
return {
"model": "test/mock-model",
"toolsets": ["terminal", "file"],
"max_turns": 10,
"terminal": {
"backend": "local",
"cwd": "/tmp",
"timeout": 30,
},
"compression": {"enabled": False},
"memory": {"memory_enabled": False, "user_profile_enabled": False},
"command_allowlist": [],
}
# ── Per-test timeout — handled by the isolation plugin ─────────────────────
#
# The subprocess-per-test plugin enforces the configured ``isolate_timeout``
# ini key by terminating the child if it overruns. The old SIGALRM-based
# fixture (POSIX-only, didn't work on Windows) is gone.
@pytest.fixture(autouse=True)
def _ensure_current_event_loop(request):
"""Provide a default event loop for sync tests that call get_event_loop().
Python 3.11+ no longer guarantees a current loop for plain synchronous tests.
A number of gateway tests still use asyncio.get_event_loop().run_until_complete(...).
Ensure they always have a usable loop without interfering with pytest-asyncio's
own loop management for @pytest.mark.asyncio tests.
On Python 3.12+, ``asyncio.get_event_loop_policy().get_event_loop()`` with no
*running* loop emits DeprecationWarning; skip that path and install a fresh
loop via ``new_event_loop()`` instead.
"""
if request.node.get_closest_marker("asyncio") is not None:
yield
return
loop = None
try:
loop = asyncio.get_running_loop()
except RuntimeError:
pass
if loop is None and sys.version_info < (3, 12):
try:
loop = asyncio.get_event_loop_policy().get_event_loop()
except RuntimeError:
loop = None
created = loop is None or loop.is_closed()
if created:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
yield
finally:
if created and loop is not None:
try:
loop.close()
finally:
asyncio.set_event_loop(None)
# ── Live-system guard ──────────────────────────────────────────────────────
#
# Several test files exercise the gateway-restart / kill code paths
# (``cmd_update``, ``kill_gateway_processes``, ``stop_profile_gateway``).
# When a single test forgets to mock either ``os.kill`` or the global
# ``find_gateway_pids`` helper, the real call leaks out of the hermetic
# environment and finds the developer's live ``hermes-gateway`` process
# via ``psutil`` — sending it SIGTERM mid-test. The shutdown forensics in
# PR #23285 caught this happening 5+ times in 3 days, every time
# correlated with a ``tests/hermes_cli/`` pytest run starting up.
#
# This fixture makes the leak impossible by intercepting the two
# primitives that actually do damage:
#
# • ``os.kill`` rejects any PID outside the test process subtree with
# a hard ``RuntimeError`` so the offending test gets a stack trace
# instead of silently murdering the real gateway.
# • ``subprocess.run`` / ``subprocess.Popen`` / ``call`` / ``check_call`` /
# ``check_output`` reject any ``systemctl ... <verb> hermes-gateway``
# invocation that would mutate the live unit. Read-only systemctl
# calls (``status``, ``show``, ``list-units``) still pass through.
#
# We intentionally do NOT stub ``find_gateway_pids`` / ``_scan_gateway_pids``
# here — tests of those functions themselves need the real implementation.
# Even if a test gets the live gateway PID back from a real scan, the
# ``os.kill`` guard above catches the actual signal call, and the
# ``systemctl`` guard catches the systemd path. Discovery without
# delivery is harmless.
_LIVE_SYSTEM_GUARD_BYPASS_MARK = "live_system_guard_bypass"
def pytest_configure(config): # noqa: D401 — pytest hook
"""Register markers used by hermetic conftest."""
config.addinivalue_line(
"markers",
f"{_LIVE_SYSTEM_GUARD_BYPASS_MARK}: bypass the live-system guard "
"(only for tests that genuinely need real os.kill / subprocess "
"behaviour — e.g. PTY tests that signal their own child).",
)
@pytest.fixture(autouse=True)
def _live_system_guard(request, monkeypatch):
"""Block real os.kill / systemctl / gateway-pid scans during tests.
See block comment above for the why. Tests that genuinely need
real signal delivery (e.g. PTY tests that SIGINT their own child)
can opt out with ``@pytest.mark.live_system_guard_bypass``.
Coverage (every primitive that can deliver a signal to or otherwise
terminate a foreign process):
• os.kill, os.killpg (POSIX)
• subprocess.run / Popen / call / check_call / check_output
• subprocess.getoutput / getstatusoutput
• os.system / os.popen
• pty.spawn
• asyncio.create_subprocess_exec / create_subprocess_shell
Subprocess inspection looks at the WHOLE command string (not just
tokens[0]), so ``bash -c "systemctl restart hermes-gateway"``,
``sudo systemctl ...``, ``env systemctl ...``, ``setsid systemctl ...``
are all caught. ``pkill``/``killall``/``taskkill`` invocations
targeting hermes/python patterns are also blocked.
"""
if request.node.get_closest_marker(_LIVE_SYSTEM_GUARD_BYPASS_MARK):
yield
return
import os as _os
import shlex as _shlex
import subprocess as _subprocess
test_pid = _os.getpid()
# Capture the test process's existing children at fixture start —
# any *new* children spawned by the test are also allowlisted via
# the live psutil walk below. Static set keeps the fast path cheap.
try:
import psutil as _psutil
_initial_children = {
c.pid for c in _psutil.Process(test_pid).children(recursive=True)
}
except Exception:
_psutil = None
_initial_children = set()
def _is_own_subtree(pid: int) -> bool:
# PID 0 means "our own process group"; -1 means "every process we
# can signal". Both are dangerous when paired with SIGTERM/SIGKILL,
# but pid 0 is technically scoped to our group so allow it; pid -1
# is treated as foreign (refuse).
if pid == 0:
return True
if pid < 0:
return False
if pid == test_pid or pid in _initial_children:
return True
if _psutil is None:
return False
try:
walker = _psutil.Process(pid)
except Exception:
# Stale PID — kill would be a no-op anyway, allow it.
return True
try:
for parent in walker.parents():
if parent.pid == test_pid:
return True
except Exception:
return False
return False
real_kill = _os.kill
def _guarded_kill(pid, sig, *args, **kwargs):
if _is_own_subtree(int(pid)):
return real_kill(pid, sig, *args, **kwargs)
raise RuntimeError(
f"tests/conftest.py live-system guard: blocked os.kill("
f"{pid}, {sig}) — PID is outside the test process subtree. "
"If this fired in CI it means the test reached a real "
"kill_gateway_processes / stop_profile_gateway / cmd_update "
"code path without mocking find_gateway_pids and os.kill. "
"Mock both, or mark the test with "
"@pytest.mark.live_system_guard_bypass if real signal "
"delivery is genuinely required."
)
monkeypatch.setattr(_os, "kill", _guarded_kill)
# ``os.killpg`` is the same risk class — sends a signal to every
# process in a group. The gateway is a session leader (its own
# PGID == its PID), so killpg(gateway_pid, SIGTERM) is a one-shot
# kill of the live process. Allow it only when the target PGID is
# the test process's own group.
if hasattr(_os, "killpg"):
real_killpg = _os.killpg
own_pgid = _os.getpgrp()
def _guarded_killpg(pgid, sig, *args, **kwargs):
if int(pgid) == own_pgid or _is_own_subtree(int(pgid)):
return real_killpg(pgid, sig, *args, **kwargs)
raise RuntimeError(
f"tests/conftest.py live-system guard: blocked "
f"os.killpg({pgid}, {sig}) — PGID is outside the test "
"process group. See _live_system_guard for the why."
)
monkeypatch.setattr(_os, "killpg", _guarded_killpg)
# ── Subprocess command-string inspection (whole-line) ──────────
_HERMES_TOKENS = (
"hermes-gateway",
"hermes.service",
"hermes_cli.main gateway",
"hermes_cli/main.py gateway",
"gateway/run.py",
"hermes gateway",
)
_MUTATING_VERBS = (
"restart", "start", "stop", "kill", "reload",
"reset-failed", "enable", "disable", "mask", "unmask",
"daemon-reload", "try-restart", "reload-or-restart",
)
_PROCESS_KILLERS = ("pkill", "killall", "taskkill", "skill", "fuser")
def _cmd_to_string(cmd) -> str:
if cmd is None:
return ""
if isinstance(cmd, (bytes, bytearray)):
try:
return bytes(cmd).decode(errors="replace")
except Exception:
return ""
if isinstance(cmd, str):
return cmd
if isinstance(cmd, (list, tuple)):
try:
return " ".join(str(t) for t in cmd)
except Exception:
return ""
return str(cmd)
def _matches_hermes_gateway(cmd_str: str) -> bool:
low = cmd_str.lower()
return any(tok in low for tok in _HERMES_TOKENS)
def _is_blocked_systemctl(cmd) -> bool:
cmd_str = _cmd_to_string(cmd)
if "systemctl" not in cmd_str:
return False
if not _matches_hermes_gateway(cmd_str):
return False
try:
tokens = _shlex.split(cmd_str)
except ValueError:
tokens = cmd_str.split()
return any(verb in tokens for verb in _MUTATING_VERBS)
def _is_process_killer(cmd) -> bool:
cmd_str = _cmd_to_string(cmd)
try:
tokens = _shlex.split(cmd_str)
except ValueError:
tokens = cmd_str.split()
if not tokens:
return False
for tok in tokens:
head = tok.rsplit("/", 1)[-1].rsplit("\\", 1)[-1]
if head in _PROCESS_KILLERS:
low = cmd_str.lower()
# pkill -f pattern: catch hermes-themed patterns + a
# plain "python" -f which would catch the live gateway
# whose cmdline contains "python -m hermes_cli.main".
if (
"hermes" in low
or "gateway" in low
or ("python" in low and "-f" in tokens)
):
return True
return False
def _check_subprocess_cmd(name, cmd):
if _is_blocked_systemctl(cmd):
raise RuntimeError(
f"tests/conftest.py live-system guard: blocked "
f"subprocess.{name}({cmd!r}) — would mutate the "
"live hermes-gateway systemd unit. Mock "
"subprocess.run / _run_systemctl in the test, or "
"mark with @pytest.mark.live_system_guard_bypass."
)
if _is_process_killer(cmd):
raise RuntimeError(
f"tests/conftest.py live-system guard: blocked "
f"subprocess.{name}({cmd!r}) — process-killer command "
"targeting hermes/python could hit the live gateway. "
"Mark with @pytest.mark.live_system_guard_bypass if "
"intentional."
)
def _wrap_subprocess(name, real):
def _guarded(cmd, *args, **kwargs):
_check_subprocess_cmd(name, cmd)
return real(cmd, *args, **kwargs)
_guarded.__name__ = f"_guarded_{name}"
# Make the wrapper subscriptable like the wrapped callable when
# the wrapped object is. ``subprocess.Popen[bytes]`` is used as
# a type annotation in third-party packages (mcp, etc.); replacing
# ``Popen`` with a plain function breaks ``Popen[bytes]`` at
# import time. Defer ``__class_getitem__`` to the original.
if hasattr(real, "__class_getitem__"):
_guarded.__class_getitem__ = real.__class_getitem__
return _guarded
def _wrap_popen():
"""Subclass Popen so isinstance checks AND Popen[bytes] still work."""
real = _subprocess.Popen
class _GuardedPopen(real): # type: ignore[misc, valid-type]
def __init__(self, cmd, *args, **kwargs):
_check_subprocess_cmd("Popen", cmd)
super().__init__(cmd, *args, **kwargs)
_GuardedPopen.__name__ = "Popen"
_GuardedPopen.__qualname__ = "Popen"
return _GuardedPopen
real_run = _subprocess.run
real_popen = _subprocess.Popen
real_call = _subprocess.call
real_check_call = _subprocess.check_call
real_check_output = _subprocess.check_output
real_getoutput = _subprocess.getoutput
real_getstatusoutput = _subprocess.getstatusoutput
monkeypatch.setattr(_subprocess, "run", _wrap_subprocess("run", real_run))
monkeypatch.setattr(_subprocess, "Popen", _wrap_popen())
monkeypatch.setattr(_subprocess, "call", _wrap_subprocess("call", real_call))
monkeypatch.setattr(
_subprocess, "check_call", _wrap_subprocess("check_call", real_check_call)
)
monkeypatch.setattr(
_subprocess,
"check_output",
_wrap_subprocess("check_output", real_check_output),
)
monkeypatch.setattr(
_subprocess, "getoutput", _wrap_subprocess("getoutput", real_getoutput)
)
monkeypatch.setattr(
_subprocess,
"getstatusoutput",
_wrap_subprocess("getstatusoutput", real_getstatusoutput),
)
# os.system / os.popen — same risk class, completely unwrapped before.
real_os_system = _os.system
real_os_popen = _os.popen
def _guarded_os_system(command):
_check_subprocess_cmd("os.system", command)
return real_os_system(command)
def _guarded_os_popen(cmd, *args, **kwargs):
_check_subprocess_cmd("os.popen", cmd)
return real_os_popen(cmd, *args, **kwargs)
monkeypatch.setattr(_os, "system", _guarded_os_system)
monkeypatch.setattr(_os, "popen", _guarded_os_popen)
# pty.spawn — POSIX-only.
try:
import pty as _pty
if hasattr(_pty, "spawn"):
real_pty_spawn = _pty.spawn
def _guarded_pty_spawn(argv, *args, **kwargs):
_check_subprocess_cmd("pty.spawn", argv)
return real_pty_spawn(argv, *args, **kwargs)
monkeypatch.setattr(_pty, "spawn", _guarded_pty_spawn)
except Exception:
pass
# asyncio.create_subprocess_* — bypasses subprocess module entirely.
try:
import asyncio as _asyncio
real_async_exec = _asyncio.create_subprocess_exec
real_async_shell = _asyncio.create_subprocess_shell
async def _guarded_async_exec(program, *args, **kwargs):
_check_subprocess_cmd(
"asyncio.create_subprocess_exec", [program, *args]
)
return await real_async_exec(program, *args, **kwargs)
async def _guarded_async_shell(cmd, *args, **kwargs):
_check_subprocess_cmd("asyncio.create_subprocess_shell", cmd)
return await real_async_shell(cmd, *args, **kwargs)
monkeypatch.setattr(_asyncio, "create_subprocess_exec", _guarded_async_exec)
monkeypatch.setattr(
_asyncio, "create_subprocess_shell", _guarded_async_shell
)
except Exception:
pass
yield