Commit graph

3 commits

Author SHA1 Message Date
Teknium
6cae0744f0
test: mock retry backoff and compression sleeps in slow tests
Cuts ~65s off shard 3's local runtime (108s \u2192 48s) by neutralizing
real wall-clock waits in backoff/compression/retry paths. Tests assert
behavior (retry count, final result, error handling), never timing.

Changes:
- tests/run_agent/conftest.py (NEW): autouse fixture mocks
  run_agent.jittered_backoff to 0.0 for all tests in the directory.
  Collapses the `while time.time() < sleep_end` busy-loop to a no-op.
  Does NOT mock time.sleep globally (breaks threading tests).
- test_anthropic_error_handling.py: per-file fixture mocks time.sleep
  and asyncio.sleep for this test's retry paths (6 tests \u00d7 10s \u2192 ~2s each).
- test_413_compression.py: mocks time.sleep for the 2s compression retry
  pauses (9 tests \u00d7 2s \u2192 millisecond range).
- test_run_agent_codex_responses.py: mocks time.sleep for Codex retry
  path (6.8s \u2192 0.24s on the empty-output retry test).
- test_fallback_model.py: mocks time.sleep for transport-recovery path.
- test_retaindb_plugin.py: caps retaindb module's time.sleep to 0.05s
  so background writer-thread sleeps don't block tests. Replaces
  arbitrary time.sleep(N) waits with polling loops.

Validation:
- tests/run_agent/ + tests/plugins/test_retaindb_plugin.py: 827 passed,
  0 failed, 22.9s (was ~75s before).
- Matrix shard 3 local: 3098 passed, 48.2s (was 108s).
- No test's timing-assertion contract is changed (tests still verify
  retry happens, just don't wait 5s for it).
2026-04-17 13:19:00 -07:00
Teknium
2367c6ffd5
test: remove 169 change-detector tests across 21 files (#11472)
First pass of test-suite reduction to address flaky CI and bloat.

Removed tests that fall into these change-detector patterns:

1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests
   that call inspect.getsource() on production modules and grep for string
   literals. Break on any refactor/rename even when behavior is correct.

2. Platform enum tautologies (every gateway/test_X.py): assertions like
   `Platform.X.value == 'x'` duplicated across ~9 adapter test files.

3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that
   only verify a key exists in a dict. Data-layout tests, not behavior.

4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing
   _fallback): tests that do parser.parse_args([...]) then assert args.field.
   Tests Python's argparse, not our code.

5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch
   cmd_X, call plugins_command with matching action, assert mock called.
   Tests the if/elif chain, not behavior.

6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests,
   test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests
   that mock the external API client, call our function, and assert exact
   kwargs. Break on refactor even when behavior is preserved.

7. Schedule-internal "function-was-called" tests (acp/test_server scheduling
   tests): tests that patch own helper method, then assert it was called.

Kept behavioral tests throughout: error paths (pytest.raises), security
tests (path traversal, SSRF, redaction), message alternation invariants,
provider API format conversion, streaming logic, memory contract, real
config load/merge tests.

Net reduction: 169 tests removed. 38 empty classes cleaned up.

Collected before: 12,522 tests
Collected after:  12,353 tests
2026-04-17 01:05:09 -07:00
Teknium
5747590770 fix: follow-up improvements for salvaged PR #5456
- SQLite write queue: thread-local connection pooling instead of
  creating+closing a new connection per operation
- Prefetch threads: join previous batch before spawning new ones to
  prevent thread accumulation on rapid queue_prefetch() calls
- Shutdown: join prefetch threads before stopping write queue
- Add 73 tests covering _Client HTTP payloads, _WriteQueue crash
  recovery & connection reuse, _build_overlay deduplication,
  RetainDBMemoryProvider lifecycle/tools/prefetch/hooks, thread
  accumulation guard, and reasoning_level heuristic
2026-04-06 02:00:55 -07:00