test: make test env hermetic; enforce CI parity via scripts/run_tests.sh (#11577)

* test: make test env hermetic; enforce CI parity via scripts/run_tests.sh

Fixes the recurring 'works locally, fails in CI' (and vice versa) class
of flakes by making tests hermetic and providing a canonical local runner
that matches CI's environment.

## Layer 1 — hermetic conftest.py (tests/conftest.py)

Autouse fixture now unsets every credential-shaped env var before every
test, so developer-local API keys can't leak into tests that assert
'auto-detect provider when key present'.

Pattern: unset any var ending in _API_KEY, _TOKEN, _SECRET, _PASSWORD,
_CREDENTIALS, _ACCESS_KEY, _PRIVATE_KEY, etc. Plus an explicit list of
credential names that don't fit the suffix pattern (AWS_ACCESS_KEY_ID,
FAL_KEY, GH_TOKEN, etc.) and all the provider BASE_URL overrides that
change auto-detect behavior.

Also unsets HERMES_* behavioral vars (HERMES_YOLO_MODE, HERMES_QUIET,
HERMES_SESSION_*, etc.) that mutate agent behavior.

Also:
  - Redirects HOME to a per-test tempdir (not just HERMES_HOME), so
    code reading ~/.hermes/* directly can't touch the real dir.
  - Pins TZ=UTC, LANG=C.UTF-8, LC_ALL=C.UTF-8, PYTHONHASHSEED=0 to
    match CI's deterministic runtime.

The old _isolate_hermes_home fixture name is preserved as an alias so
any test that yields it explicitly still works.

## Layer 2 — scripts/run_tests.sh canonical runner

'Always use scripts/run_tests.sh, never call pytest directly' is the
new rule (documented in AGENTS.md). The script:
  - Unsets all credential env vars (belt-and-suspenders for callers
    who bypass conftest — e.g. IDE integrations)
  - Pins TZ/LANG/PYTHONHASHSEED
  - Uses -n 4 xdist workers (matches GHA ubuntu-latest; -n auto on
    a 20-core workstation surfaces test-ordering flakes CI will never
    see, causing the infamous 'passes in CI, fails locally' drift)
  - Finds the venv in .venv, venv, or main checkout's venv
  - Passes through arbitrary pytest args

Installs pytest-split on demand so the script can also be used to run
matrix-split subsets locally for debugging.

## Remove 3 module-level dotenv stubs that broke test isolation

tests/hermes_cli/test_{arcee,xiaomi,api_key}_provider.py each had a
module-level:

    if 'dotenv' not in sys.modules:
        fake_dotenv = types.ModuleType('dotenv')
        fake_dotenv.load_dotenv = lambda *a, **kw: None
        sys.modules['dotenv'] = fake_dotenv

This patches sys.modules['dotenv'] to a fake at import time with no
teardown. Under pytest-xdist LoadScheduling, whichever worker collected
one of these files first poisoned its sys.modules; subsequent tests in
the same worker that imported load_dotenv transitively (e.g.
test_env_loader.py via hermes_cli.env_loader) got the no-op lambda and
saw their assertions fail.

dotenv is a required dependency (python-dotenv>=1.2.1 in pyproject.toml),
so the defensive stub was never needed. Removed.

## Validation

- tests/hermes_cli/ alone: 2178 passed, 1 skipped, 0 failed (was 4
  failures in test_env_loader.py before this fix)
- tests/test_plugin_skills.py, tests/hermes_cli/test_plugins.py,
  tests/test_hermes_logging.py combined: 123 passed (the caplog
  regression tests from PR #11453 still pass)
- Local full run shows no F/E clusters in the 0-55% range that were
  previously present before the conftest hardening

## Background

See AGENTS.md 'Testing' section for the full list of drift sources
this closes. Matrix split (closed as #11566) will be re-attempted
once this foundation lands — cross-test pollution was the root cause
of the shard-3 hang in that PR.

* fix(conftest): don't redirect HOME — it broke CI subprocesses

PR #11577's autouse fixture was setting HOME to a per-test tempdir.
CI started timing out at 97% complete with dozens of E/F markers and
orphan python processes at cleanup — tests (or transitive deps)
spawn subprocesses that expect a stable HOME, and the redirect broke
them in non-obvious ways.

Env-var unsetting and TZ/LANG/hashseed pinning (the actual CI-drift
fixes) are unchanged and still in place. HERMES_HOME redirection is
also unchanged — that's the canonical way to isolate tests from
~/.hermes/, not HOME.

Any code in the codebase reading ~/.hermes/* via `Path.home() / ".hermes"`
instead of `get_hermes_home()` is a bug to fix at the callsite, not
something to paper over in conftest.
This commit is contained in:
Teknium 2026-04-17 06:09:09 -07:00 committed by GitHub
parent ee95822e07
commit d404849351
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 366 additions and 47 deletions

View file

@ -458,13 +458,45 @@ def profile_env(tmp_path, monkeypatch):
## Testing
**ALWAYS use `scripts/run_tests.sh`** — do not call `pytest` directly. The script enforces
hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8,
4 xdist workers matching GHA ubuntu-latest). Direct `pytest` on a 16+ core
developer machine with API keys set diverges from CI in ways that have caused
multiple "works locally, fails in CI" incidents (and the reverse).
```bash
source venv/bin/activate
python -m pytest tests/ -q # Full suite (~3000 tests, ~3 min)
python -m pytest tests/test_model_tools.py -q # Toolset resolution
python -m pytest tests/test_cli_init.py -q # CLI config loading
python -m pytest tests/gateway/ -q # Gateway tests
python -m pytest tests/tools/ -q # Tool-level tests
scripts/run_tests.sh # full suite, CI-parity
scripts/run_tests.sh tests/gateway/ # one directory
scripts/run_tests.sh tests/agent/test_foo.py::test_x # one test
scripts/run_tests.sh -v --tb=long # pass-through pytest flags
```
### Why the wrapper (and why the old "just call pytest" doesn't work)
Five real sources of local-vs-CI drift the script closes:
| | Without wrapper | With wrapper |
|---|---|---|
| Provider API keys | Whatever is in your env (auto-detects pool) | All `*_API_KEY`/`*_TOKEN`/etc. unset |
| HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test |
| Timezone | Local TZ (PDT etc.) | UTC |
| Locale | Whatever is set | C.UTF-8 |
| xdist workers | `-n auto` = all cores (20+ on a workstation) | `-n 4` matching CI |
`tests/conftest.py` also enforces points 1-4 as an autouse fixture so ANY pytest
invocation (including IDE integrations) gets hermetic behavior — but the wrapper
is belt-and-suspenders.
### Running without the wrapper (only if you must)
If you can't use the wrapper (e.g. on Windows or inside an IDE that shells
pytest directly), at minimum activate the venv and pass `-n 4`:
```bash
source venv/bin/activate
python -m pytest tests/ -q -n 4
```
Worker count above 4 will surface test-ordering flakes that CI never sees.
Always run the full suite before pushing changes.

104
scripts/run_tests.sh Executable file
View file

@ -0,0 +1,104 @@
#!/usr/bin/env bash
# Canonical test runner for hermes-agent. Run this instead of calling
# `pytest` directly to guarantee your local run matches CI behavior.
#
# What this script enforces:
# * -n 4 xdist workers (CI has 4 cores; -n auto diverges locally)
# * TZ=UTC, LANG=C.UTF-8, PYTHONHASHSEED=0 (deterministic)
# * Credential env vars blanked (conftest.py also does this, but this
# is belt-and-suspenders for anyone running `pytest` outside of
# our conftest path — e.g. calling pytest on a single file)
# * Proper venv activation
#
# Usage:
# scripts/run_tests.sh # full suite
# scripts/run_tests.sh tests/agent/ # one directory
# scripts/run_tests.sh tests/agent/test_foo.py::TestClass::test_method
# scripts/run_tests.sh --tb=long -v # pass-through pytest args
set -euo pipefail
# ── Locate repo root ────────────────────────────────────────────────────────
# Works whether this is the main checkout or a worktree.
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
# ── Activate venv ───────────────────────────────────────────────────────────
# Prefer a .venv in the current tree, fall back to the main checkout's venv
# (useful for worktrees where we don't always duplicate the venv).
VENV=""
for candidate in "$REPO_ROOT/.venv" "$REPO_ROOT/venv" "$HOME/.hermes/hermes-agent/venv"; do
if [ -f "$candidate/bin/activate" ]; then
VENV="$candidate"
break
fi
done
if [ -z "$VENV" ]; then
echo "error: no virtualenv found in $REPO_ROOT/.venv or $REPO_ROOT/venv" >&2
exit 1
fi
PYTHON="$VENV/bin/python"
# ── Ensure pytest-split is installed (required for shard-equivalent runs) ──
if ! "$PYTHON" -c "import pytest_split" 2>/dev/null; then
echo "→ installing pytest-split into $VENV"
"$PYTHON" -m pip install --quiet "pytest-split>=0.9,<1"
fi
# ── Hermetic environment ────────────────────────────────────────────────────
# Mirror what CI does in .github/workflows/tests.yml + what conftest.py does.
# Unset every credential-shaped var currently in the environment.
while IFS='=' read -r name _; do
case "$name" in
*_API_KEY|*_TOKEN|*_SECRET|*_PASSWORD|*_CREDENTIALS|*_ACCESS_KEY| \
*_SECRET_ACCESS_KEY|*_PRIVATE_KEY|*_OAUTH_TOKEN|*_WEBHOOK_SECRET| \
*_ENCRYPT_KEY|*_APP_SECRET|*_CLIENT_SECRET|*_CORP_SECRET|*_AES_KEY| \
AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|AWS_SESSION_TOKEN|FAL_KEY| \
GH_TOKEN|GITHUB_TOKEN)
unset "$name"
;;
esac
done < <(env)
# Unset HERMES_* behavioral vars too.
unset HERMES_YOLO_MODE HERMES_INTERACTIVE HERMES_QUIET HERMES_TOOL_PROGRESS \
HERMES_TOOL_PROGRESS_MODE HERMES_MAX_ITERATIONS HERMES_SESSION_PLATFORM \
HERMES_SESSION_CHAT_ID HERMES_SESSION_CHAT_NAME HERMES_SESSION_THREAD_ID \
HERMES_SESSION_SOURCE HERMES_SESSION_KEY HERMES_GATEWAY_SESSION \
HERMES_PLATFORM HERMES_INFERENCE_PROVIDER HERMES_MANAGED HERMES_DEV \
HERMES_CONTAINER HERMES_EPHEMERAL_SYSTEM_PROMPT HERMES_TIMEZONE \
HERMES_REDACT_SECRETS HERMES_BACKGROUND_NOTIFICATIONS HERMES_EXEC_ASK \
HERMES_HOME_MODE 2>/dev/null || true
# Pin deterministic runtime.
export TZ=UTC
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
export PYTHONHASHSEED=0
# ── Worker count ────────────────────────────────────────────────────────────
# CI uses `-n auto` on ubuntu-latest which gives 4 workers. A 20-core
# workstation with `-n auto` gets 20 workers and exposes test-ordering
# flakes that CI will never see. Pin to 4 so local matches CI.
WORKERS="${HERMES_TEST_WORKERS:-4}"
# ── Run pytest ──────────────────────────────────────────────────────────────
cd "$REPO_ROOT"
# If the first argument starts with `-` treat all args as pytest flags;
# otherwise treat them as test paths.
ARGS=("$@")
echo "▶ running pytest with $WORKERS workers, hermetic env, in $REPO_ROOT"
echo " (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset)"
# -o "addopts=" clears pyproject.toml's `-n auto` so our -n wins.
exec "$PYTHON" -m pytest \
-o "addopts=" \
-n "$WORKERS" \
--ignore=tests/integration \
--ignore=tests/e2e \
-m "not integration" \
"${ARGS[@]}"

View file

@ -1,7 +1,27 @@
"""Shared fixtures for the hermes-agent test suite."""
"""Shared fixtures for the hermes-agent test suite.
Hermetic-test invariants enforced here (see AGENTS.md for rationale):
1. **No credential env vars.** All provider/credential-shaped env vars
(ending in _API_KEY, _TOKEN, _SECRET, _PASSWORD, _CREDENTIALS, etc.)
are unset before every test. Local developer keys cannot leak in.
2. **Isolated HERMES_HOME.** HERMES_HOME points to a per-test tempdir so
code reading ``~/.hermes/*`` via ``get_hermes_home()`` can't see the
real one. (We do NOT also redirect HOME that broke subprocesses in
CI. Code using ``Path.home() / ".hermes"`` instead of the canonical
``get_hermes_home()`` is a bug to fix at the callsite.)
3. **Deterministic runtime.** TZ=UTC, LANG=C.UTF-8, PYTHONHASHSEED=0.
4. **No HERMES_SESSION_* inheritance** the agent's current gateway
session must not leak into tests.
These invariants make the local test run match CI closely. Gaps that
remain (CPU count, xdist worker count) are addressed by the canonical
test runner at ``scripts/run_tests.sh``.
"""
import asyncio
import os
import re
import signal
import sys
import tempfile
@ -16,30 +36,215 @@ if str(PROJECT_ROOT) not in sys.path:
sys.path.insert(0, str(PROJECT_ROOT))
# ── Credential env-var filter ──────────────────────────────────────────────
#
# Any env var in the current process matching ONE of these patterns is
# unset for every test. Developers' local keys cannot leak into assertions
# about "auto-detect provider when key present".
_CREDENTIAL_SUFFIXES = (
"_API_KEY",
"_TOKEN",
"_SECRET",
"_PASSWORD",
"_CREDENTIALS",
"_ACCESS_KEY",
"_SECRET_ACCESS_KEY",
"_PRIVATE_KEY",
"_OAUTH_TOKEN",
"_WEBHOOK_SECRET",
"_ENCRYPT_KEY",
"_APP_SECRET",
"_CLIENT_SECRET",
"_CORP_SECRET",
"_AES_KEY",
)
# Explicit names (for ones that don't fit the suffix pattern)
_CREDENTIAL_NAMES = frozenset({
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"ANTHROPIC_TOKEN",
"FAL_KEY",
"GH_TOKEN",
"GITHUB_TOKEN",
"OPENAI_API_KEY",
"OPENROUTER_API_KEY",
"NOUS_API_KEY",
"GEMINI_API_KEY",
"GOOGLE_API_KEY",
"GROQ_API_KEY",
"XAI_API_KEY",
"MISTRAL_API_KEY",
"DEEPSEEK_API_KEY",
"KIMI_API_KEY",
"MOONSHOT_API_KEY",
"GLM_API_KEY",
"ZAI_API_KEY",
"MINIMAX_API_KEY",
"OLLAMA_API_KEY",
"OPENVIKING_API_KEY",
"COPILOT_API_KEY",
"CLAUDE_CODE_OAUTH_TOKEN",
"BROWSERBASE_API_KEY",
"FIRECRAWL_API_KEY",
"PARALLEL_API_KEY",
"EXA_API_KEY",
"TAVILY_API_KEY",
"WANDB_API_KEY",
"ELEVENLABS_API_KEY",
"HONCHO_API_KEY",
"MEM0_API_KEY",
"SUPERMEMORY_API_KEY",
"RETAINDB_API_KEY",
"HINDSIGHT_API_KEY",
"HINDSIGHT_LLM_API_KEY",
"TINKER_API_KEY",
"DAYTONA_API_KEY",
"TWILIO_AUTH_TOKEN",
"TELEGRAM_BOT_TOKEN",
"DISCORD_BOT_TOKEN",
"SLACK_BOT_TOKEN",
"SLACK_APP_TOKEN",
"MATTERMOST_TOKEN",
"MATRIX_ACCESS_TOKEN",
"MATRIX_PASSWORD",
"MATRIX_RECOVERY_KEY",
"HASS_TOKEN",
"EMAIL_PASSWORD",
"BLUEBUBBLES_PASSWORD",
"FEISHU_APP_SECRET",
"FEISHU_ENCRYPT_KEY",
"FEISHU_VERIFICATION_TOKEN",
"DINGTALK_CLIENT_SECRET",
"QQ_CLIENT_SECRET",
"QQ_STT_API_KEY",
"WECOM_SECRET",
"WECOM_CALLBACK_CORP_SECRET",
"WECOM_CALLBACK_TOKEN",
"WECOM_CALLBACK_ENCODING_AES_KEY",
"WEIXIN_TOKEN",
"MODAL_TOKEN_ID",
"MODAL_TOKEN_SECRET",
"TERMINAL_SSH_KEY",
"SUDO_PASSWORD",
"GATEWAY_PROXY_KEY",
"API_SERVER_KEY",
"TOOL_GATEWAY_USER_TOKEN",
"TELEGRAM_WEBHOOK_SECRET",
"WEBHOOK_SECRET",
"AI_GATEWAY_API_KEY",
"VOICE_TOOLS_OPENAI_KEY",
"BROWSER_USE_API_KEY",
"CUSTOM_API_KEY",
"GATEWAY_PROXY_URL",
"GEMINI_BASE_URL",
"OPENAI_BASE_URL",
"OPENROUTER_BASE_URL",
"OLLAMA_BASE_URL",
"GROQ_BASE_URL",
"XAI_BASE_URL",
"AI_GATEWAY_BASE_URL",
"ANTHROPIC_BASE_URL",
})
def _looks_like_credential(name: str) -> bool:
"""True if env var name matches a credential-shaped pattern."""
if name in _CREDENTIAL_NAMES:
return True
return any(name.endswith(suf) for suf in _CREDENTIAL_SUFFIXES)
# HERMES_* vars that change test behavior by being set. Unset all of these
# unconditionally — individual tests that need them set do so explicitly.
_HERMES_BEHAVIORAL_VARS = frozenset({
"HERMES_YOLO_MODE",
"HERMES_INTERACTIVE",
"HERMES_QUIET",
"HERMES_TOOL_PROGRESS",
"HERMES_TOOL_PROGRESS_MODE",
"HERMES_MAX_ITERATIONS",
"HERMES_SESSION_PLATFORM",
"HERMES_SESSION_CHAT_ID",
"HERMES_SESSION_CHAT_NAME",
"HERMES_SESSION_THREAD_ID",
"HERMES_SESSION_SOURCE",
"HERMES_SESSION_KEY",
"HERMES_GATEWAY_SESSION",
"HERMES_PLATFORM",
"HERMES_INFERENCE_PROVIDER",
"HERMES_MANAGED",
"HERMES_DEV",
"HERMES_CONTAINER",
"HERMES_EPHEMERAL_SYSTEM_PROMPT",
"HERMES_TIMEZONE",
"HERMES_REDACT_SECRETS",
"HERMES_BACKGROUND_NOTIFICATIONS",
"HERMES_EXEC_ASK",
"HERMES_HOME_MODE",
})
@pytest.fixture(autouse=True)
def _isolate_hermes_home(tmp_path, monkeypatch):
"""Redirect HERMES_HOME to a temp dir so tests never write to ~/.hermes/."""
fake_home = tmp_path / "hermes_test"
fake_home.mkdir()
(fake_home / "sessions").mkdir()
(fake_home / "cron").mkdir()
(fake_home / "memories").mkdir()
(fake_home / "skills").mkdir()
monkeypatch.setenv("HERMES_HOME", str(fake_home))
# Reset plugin singleton so tests don't leak plugins from ~/.hermes/plugins/
def _hermetic_environment(tmp_path, monkeypatch):
"""Blank out all credential/behavioral env vars so local and CI match.
Also redirects HOME and HERMES_HOME to per-test tempdirs so code that
reads ``~/.hermes/*`` can't touch the real one, and pins TZ/LANG so
datetime/locale-sensitive tests are deterministic.
"""
# 1. Blank every credential-shaped env var that's currently set.
for name in list(os.environ.keys()):
if _looks_like_credential(name):
monkeypatch.delenv(name, raising=False)
# 2. Blank behavioral HERMES_* vars that could change test semantics.
for name in _HERMES_BEHAVIORAL_VARS:
monkeypatch.delenv(name, raising=False)
# 3. Redirect HERMES_HOME to a per-test tempdir. Code that reads
# ``~/.hermes/*`` via ``get_hermes_home()`` now gets the tempdir.
#
# NOTE: We do NOT also redirect HOME. Doing so broke CI because
# some tests (and their transitive deps) spawn subprocesses that
# inherit HOME and expect it to be stable. If a test genuinely
# needs HOME isolated, it should set it explicitly in its own
# fixture. Any code in the codebase reading ``~/.hermes/*`` via
# ``Path.home() / ".hermes"`` instead of ``get_hermes_home()``
# is a bug to fix at the callsite.
fake_hermes_home = tmp_path / "hermes_test"
fake_hermes_home.mkdir()
(fake_hermes_home / "sessions").mkdir()
(fake_hermes_home / "cron").mkdir()
(fake_hermes_home / "memories").mkdir()
(fake_hermes_home / "skills").mkdir()
monkeypatch.setenv("HERMES_HOME", str(fake_hermes_home))
# 4. Deterministic locale / timezone / hashseed. CI runs in UTC with
# C.UTF-8 locale; local dev often doesn't. Pin everything.
monkeypatch.setenv("TZ", "UTC")
monkeypatch.setenv("LANG", "C.UTF-8")
monkeypatch.setenv("LC_ALL", "C.UTF-8")
monkeypatch.setenv("PYTHONHASHSEED", "0")
# 5. Reset plugin singleton so tests don't leak plugins from
# ~/.hermes/plugins/ (which, per step 3, is now empty — but the
# singleton might still be cached from a previous test).
try:
import hermes_cli.plugins as _plugins_mod
monkeypatch.setattr(_plugins_mod, "_plugin_manager", None)
except Exception:
pass
# Tests should not inherit the agent's current gateway/messaging surface.
# Individual tests that need gateway behavior set these explicitly.
monkeypatch.delenv("HERMES_SESSION_PLATFORM", raising=False)
monkeypatch.delenv("HERMES_SESSION_CHAT_ID", raising=False)
monkeypatch.delenv("HERMES_SESSION_CHAT_NAME", raising=False)
monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
# Avoid making real calls during tests if this key is set in the env files
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
# Backward-compat alias — old tests reference this fixture name. Keep it
# as a no-op wrapper so imports don't break.
@pytest.fixture(autouse=True)
def _isolate_hermes_home(_hermetic_environment):
"""Alias preserved for any test that yields this name explicitly."""
return None
@pytest.fixture()

View file

@ -1,17 +1,9 @@
"""Tests for API-key provider support (z.ai/GLM, Kimi, MiniMax, AI Gateway)."""
import os
import sys
import types
import pytest
# Ensure dotenv doesn't interfere
if "dotenv" not in sys.modules:
fake_dotenv = types.ModuleType("dotenv")
fake_dotenv.load_dotenv = lambda *args, **kwargs: None
sys.modules["dotenv"] = fake_dotenv
from hermes_cli.auth import (
PROVIDER_REGISTRY,
ProviderConfig,

View file

@ -1,15 +1,9 @@
"""Tests for Arcee AI provider support — standard direct API provider."""
import sys
import types
import pytest
if "dotenv" not in sys.modules:
fake_dotenv = types.ModuleType("dotenv")
fake_dotenv.load_dotenv = lambda *args, **kwargs: None
sys.modules["dotenv"] = fake_dotenv
from hermes_cli.auth import (
PROVIDER_REGISTRY,
resolve_provider,

View file

@ -1,17 +1,9 @@
"""Tests for Xiaomi MiMo provider support."""
import os
import sys
import types
import pytest
# Ensure dotenv doesn't interfere
if "dotenv" not in sys.modules:
fake_dotenv = types.ModuleType("dotenv")
fake_dotenv.load_dotenv = lambda *args, **kwargs: None
sys.modules["dotenv"] = fake_dotenv
from hermes_cli.auth import (
PROVIDER_REGISTRY,
resolve_provider,