diff --git a/AGENTS.md b/AGENTS.md index d8306d9bdb8..e89c819844e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1289,65 +1289,22 @@ scripts/run_tests.sh # full suite, CI-parity scripts/run_tests.sh tests/gateway/ # one directory scripts/run_tests.sh tests/agent/test_foo.py::test_x # one test scripts/run_tests.sh -v --tb=long # pass-through pytest flags -scripts/run_tests.sh --no-isolate tests/foo/ # disable subprocess isolation (faster, for debugging) ``` -### Subprocess-per-test isolation +### Subprocess-per-test-file isolation -Every test runs in a freshly-spawned Python subprocess via the in-tree plugin -at `tests/_isolate_plugin.py`. This means module-level dicts/sets and -ContextVars from one test cannot leak into the next — the historic -`_reset_module_state` autouse fixture is gone. +Every test file runs in a freshly-spawned Python subprocess via `run_tests_parallel.py`. This means module-level dicts/sets and +ContextVars from one test file cannot leak into the next. -Implementation notes: +### Why the wrapper -- The plugin uses `multiprocessing.get_context("spawn")`, which works on - Linux, macOS, and Windows alike (POSIX `fork` is not used). -- Per-test overhead is ~0.5–1.0s (Python startup + pytest collection). xdist - parallelism amortizes this across cores; on a 20-core box the full suite - finishes in roughly the same wall time as before, but flake-free. -- `isolate_timeout` (configured in `pyproject.toml`) caps each test at 30s. - Hangs are killed and surfaced as a failure report. -- Pass `--no-isolate` to disable isolation — useful when debugging a single - test interactively, or when you specifically want to verify state leakage. -- The plugin disables itself in child processes (sentinel envvar - `HERMES_ISOLATE_CHILD=1`), so there's no fork-bomb risk. +| | Without wrapper | With wrapper | +| ------------------- | ------------------------------------------- | ----------------------------------------- | +| Provider API keys | Whatever is in your env (auto-detects pool) | All env vars except a specific few unset. | +| HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test | +| Timezone | Local TZ (PDT etc.) | UTC | +| Locale | Whatever is set | C.UTF-8 | -### Why the wrapper (and why the old "just call pytest" doesn't work) - -Five real sources of local-vs-CI drift the script closes: - -| | Without wrapper | With wrapper | -|---|---|---| -| Provider API keys | Whatever is in your env (auto-detects pool) | All `*_API_KEY`/`*_TOKEN`/etc. unset | -| HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test | -| Timezone | Local TZ (PDT etc.) | UTC | -| Locale | Whatever is set | C.UTF-8 | -| xdist workers | `-n auto` = all cores | `-n auto` (safe — subprocess isolation prevents cross-worker flakes) | - -`tests/conftest.py` also enforces points 1-4 as an autouse fixture so ANY pytest -invocation (including IDE integrations) gets hermetic behavior — but the wrapper -is belt-and-suspenders. - -### Running without the wrapper (only if you must) - -If you can't use the wrapper (e.g. inside an IDE that shells pytest directly), -at minimum activate the venv. The isolation plugin loads automatically from -`addopts` in `pyproject.toml`, so you get the same per-test process isolation -either way. - -```bash -source .venv/bin/activate # or: source venv/bin/activate -python -m pytest tests/ -q -``` - -If you need to bypass isolation for fast feedback while debugging: - -```bash -python -m pytest tests/agent/test_foo.py -q --no-isolate -``` - -Always run the full suite before pushing changes. ### Don't write change-detector tests