Cuts shard-3 local runtime in half by neutralizing real wall-clock
waits across three classes of slow test:
## 1. Retry backoff mocks
- tests/run_agent/conftest.py (NEW): autouse fixture mocks
jittered_backoff to 0.0 so the `while time.time() < sleep_end`
busy-loop exits immediately. No global time.sleep mock (would
break threading tests).
- test_anthropic_error_handling, test_413_compression,
test_run_agent_codex_responses, test_fallback_model: per-file
fixtures mock time.sleep / asyncio.sleep for retry / compression
paths.
- test_retaindb_plugin: cap the retaindb module's bound time.sleep
to 0.05s via a per-test shim (background writer-thread retries
sleep 2s after errors; tests don't care about exact duration).
Plus replace arbitrary time.sleep(N) waits with short polling
loops bounded by deadline.
## 2. Subprocess sleeps in production code
- test_update_gateway_restart: mock time.sleep. Production code
does time.sleep(3) after `systemctl restart` to verify the
service survived. Tests mock subprocess.run \u2014 nothing actually
restarts \u2014 so the wait is dead time.
## 3. Network / IMDS timeouts (biggest single win)
- tests/conftest.py: add AWS_EC2_METADATA_DISABLED=true plus
AWS_METADATA_SERVICE_TIMEOUT=1 and ATTEMPTS=1. boto3 falls back
to IMDS (169.254.169.254) when no AWS creds are set. Any test
hitting has_aws_credentials() / resolve_aws_auth_env_var() (e.g.
test_status, test_setup_copilot_acp, anything that touches
provider auto-detect) burned ~2-4s waiting for that to time out.
- test_exit_cleanup_interrupt: explicitly mock
resolve_runtime_provider which was doing real network auto-detect
(~4s). Tests don't care about provider resolution \u2014 the agent
is already mocked.
- test_timezone: collapse the 3-test "TZ env in subprocess" suite
into 2 tests by checking both injection AND no-leak in the same
subprocess spawn (was 3 \u00d7 3.2s, now 2 \u00d7 4s).
## Validation
| Test | Before | After |
|---|---|---|
| test_anthropic_error_handling (8 tests) | ~80s | ~15s |
| test_413_compression (14 tests) | ~18s | 2.3s |
| test_retaindb_plugin (67 tests) | ~13s | 1.3s |
| test_status_includes_tavily_key | 4.0s | 0.05s |
| test_setup_copilot_acp_skips_same_provider_pool_step | 8.0s | 0.26s |
| test_update_gateway_restart (5 tests) | ~18s total | ~0.35s total |
| test_exit_cleanup_interrupt (2 tests) | 8s | 1.5s |
| **Matrix shard 3 local** | **108s** | **50s** |
No behavioral contract changed \u2014 tests still verify retry happens,
service restart logic runs, etc.; they just don't burn real seconds
waiting for it.
Supersedes PR #11779 (those changes are included here).
* refactor: re-architect tests to mirror the codebase
* Update tests.yml
* fix: add missing tool_error imports after registry refactor
* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist
patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.
* fix(tests): fix update_check and telegram xdist failures
- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
directly, it uses get_hermes_home() from hermes_constants.
- test_telegram_conflict/approval_buttons: provide real exception classes
for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
except clause in connect() doesn't fail with "catching classes that do
not inherit from BaseException" when xdist pollutes sys.modules.
* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock
2026-04-07 17:19:07 -07:00
Renamed from tests/test_exit_cleanup_interrupt.py (Browse further)