hermes-agent/tests/tools/test_approval_plugin_hooks.py
ethernet 48be2e0e4d
test: use subprocesses for each test file (#29016)
* ci(tests): install ripgrep from prebuilt tarball instead of apt

apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.

- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
  published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
  every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.

* fix(cli): compile syntax check to tempdir, not source __pycache__

`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:

1. Parallel test workers walking the same source tree (e.g. running
   the suite under per-file process isolation) can race against each
   other on the `__pycache__` write — manifests as flaky 'directory
   not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
   that the next interpreter run might pick up — fine when the
   interpreter version matches, sketchy if it doesn't.

Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.

* test(runner): per-file process isolation, drop manual state reset + xdist

Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.

Key changes:
  * scripts/run_tests_parallel.py — new runner: discovers test files,
    runs N in parallel via ThreadPoolExecutor, captures stdout per file,
    treats exit code 5 (no tests collected) as pass, kills all children
    on exit. Change from cpu_count to cpu_count*2. The runner is
    I/O-bound (waiting on subprocess.communicate() from pytest children)
    The parent process does almost no CPU work, so 2x oversubscription
    keeps more pipes full. When a file fails, immediately show the last
    30 lines of pytest output (stack traces + FAILED summary) plus a
    ready-to-copy repro command:
      python -m pytest tests/agent/test_auxiliary_client.py
  * scripts/run_tests.sh — delegates to run_tests_parallel.py
  * .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
  * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
  * tests/conftest.py — remove ~200 lines of manual state-reset fixtures
  * AGENTS.md — update Testing section for per-file design

* test(runner): speed gateway test antipattern scan up

* fix(test): web search provider plugin test missing xai

* fix(tests): make 14 test files pass under per-file subprocess isolation

Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:

Tool registry not populated:
  - test_video_generation_tool_surface_matrix: add discover_builtin_tools()
  - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
    registering all 8 bundled web providers, reset after each test
  - test_website_policy: same provider registration pattern
  - test_web_tools_tavily: same pattern across 3 dispatch test classes
  - Also add is_safe_url/check_website_access mocks where SSRF check
    blocks example.com (DNS resolution fails in isolated envs)

Stale check_fn cache:
  - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
    in both kanban guidance tests (prior test cached False for kanban_show)
  - test_discord_tool: cache invalidation in setup/teardown
  - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries

Module-level state pollution:
  - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
  - test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
    (ContextVar takes precedence over os.environ)
  - test_dm_topics: overwrite sys.modules + separate telegram.constants mock
    + force-reimport of gateway.platforms.telegram
  - test_terminal_tool_requirements: removed duplicate class declaration,
    autouse _clear_caches fixture

* change(tests): run_tests.sh explicitly includes env vars

instead of manually dropping some vars, now we just only include some

* fix(tests): 5 more isolation/NixOS fixes

- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
  command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
  (feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
  /etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
  (doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
  profile deletion when copytree preserved read-only modes from
  nix store

* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client

* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor

* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test

* fix: address PR #29016 review feedback

- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
  shared helper (register_all_web_providers), replacing 8 copy-pasted
  blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
  fix worker count to match code (cpu_count*2)

* fix: eliminate race in stale-cache achievements test

The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
2026-05-21 16:40:04 +05:30

155 lines
6 KiB
Python

"""Tests for pre_approval_request / post_approval_response plugin hooks.
These hooks fire in tools/approval.py::check_all_command_guards whenever a
dangerous command needs user approval. They are observer-only (return values
ignored) and must fire on BOTH the CLI-interactive path and the async gateway
path, so external tools like macOS notifiers can be alerted regardless of
which surface the user is on.
"""
from unittest.mock import patch
import pytest
import tools.approval as approval_module
from tools.approval import (
check_all_command_guards,
register_gateway_notify,
unregister_gateway_notify,
resolve_gateway_approval,
set_current_session_key,
clear_session,
)
@pytest.fixture
def isolated_session(monkeypatch, tmp_path):
"""Give each test a fresh session_key, clean approval-state, and isolated
HERMES_HOME so the real user's command_allowlist doesn't leak in."""
import tools.approval as _am
session_key = "test:session:approval_hooks"
token = set_current_session_key(session_key)
monkeypatch.setenv("HERMES_SESSION_KEY", session_key)
# Make sure we don't skip guards via yolo / approvals.mode=off
monkeypatch.delenv("HERMES_YOLO_MODE", raising=False)
# Isolate from the real user's permanent allowlist + session state
_saved_permanent = _am._permanent_approved.copy()
_saved_session = {k: v.copy() for k, v in _am._session_approved.items()}
_am._permanent_approved.clear()
_am._session_approved.clear()
try:
yield session_key
finally:
_am._permanent_approved.update(_saved_permanent)
_am._session_approved.update(_saved_session)
try:
_am._approval_session_key.reset(token)
except Exception:
pass
clear_session(session_key)
class TestCliPathFiresHooks:
"""CLI-interactive approval path: HERMES_INTERACTIVE is set, the
prompt_dangerous_approval() result decides the outcome."""
def test_pre_and_post_fire_with_expected_kwargs(
self, isolated_session, monkeypatch
):
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
# approvals.mode=manual so we actually reach the prompt site
monkeypatch.setattr(approval_module, "_get_approval_mode", lambda: "manual")
captured = []
def fake_invoke_hook(hook_name, **kwargs):
captured.append((hook_name, kwargs))
return []
# Force the user to "approve once" via the approval_callback contract
def cb(command, description, *, allow_permanent=True):
return "once"
with patch("hermes_cli.plugins.invoke_hook", side_effect=fake_invoke_hook):
result = check_all_command_guards(
"rm -rf /tmp/test-hook", "local", approval_callback=cb,
)
assert result["approved"] is True
hook_names = [c[0] for c in captured]
assert "pre_approval_request" in hook_names
assert "post_approval_response" in hook_names
pre_kwargs = next(kw for name, kw in captured if name == "pre_approval_request")
assert pre_kwargs["command"] == "rm -rf /tmp/test-hook"
assert pre_kwargs["surface"] == "cli"
assert pre_kwargs["session_key"] == isolated_session
assert isinstance(pre_kwargs["pattern_keys"], list)
assert pre_kwargs["pattern_key"] # non-empty primary pattern
assert pre_kwargs["description"]
post_kwargs = next(kw for name, kw in captured if name == "post_approval_response")
assert post_kwargs["choice"] == "once"
assert post_kwargs["surface"] == "cli"
assert post_kwargs["command"] == "rm -rf /tmp/test-hook"
def test_deny_reported_to_post_hook(self, isolated_session, monkeypatch):
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
monkeypatch.setattr(approval_module, "_get_approval_mode", lambda: "manual")
captured = []
def fake_invoke_hook(hook_name, **kwargs):
captured.append((hook_name, kwargs))
return []
def cb(command, description, *, allow_permanent=True):
return "deny"
with patch("hermes_cli.plugins.invoke_hook", side_effect=fake_invoke_hook):
result = check_all_command_guards(
"rm -rf /tmp/test-deny", "local", approval_callback=cb,
)
assert result["approved"] is False
post_kwargs = next(kw for name, kw in captured if name == "post_approval_response")
assert post_kwargs["choice"] == "deny"
def test_plugin_hook_crash_does_not_break_approval(
self, isolated_session, monkeypatch
):
"""A crashing plugin must never prevent the approval flow from
reaching the user. Hooks are observer-only and safety-critical
behavior must be preserved."""
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
monkeypatch.setattr(approval_module, "_get_approval_mode", lambda: "manual")
def boom(hook_name, **kwargs):
raise RuntimeError("plugin crashed")
def cb(command, description, *, allow_permanent=True):
return "once"
with patch("hermes_cli.plugins.invoke_hook", side_effect=boom):
result = check_all_command_guards(
"rm -rf /tmp/test-crash", "local", approval_callback=cb,
)
# User's approval was still honored despite the plugin crashing
assert result["approved"] is True
class TestGatewayPathFiresHooks:
"""Async gateway approval path: HERMES_GATEWAY_SESSION is set and a
gateway notify callback is registered. The agent thread blocks on the
approval event until resolve_gateway_approval() is called from another
thread."""