mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
test: stop testing mutable data — convert change-detectors to invariants (#13363)
Catalog snapshots, config version literals, and enumeration counts are data that changes as designed. Tests that assert on those values add no behavioral coverage — they just break CI on every routine update and cost engineering time to 'fix.' Replace with invariants where one exists, delete where none does. Deleted (pure snapshots): - TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al - TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models' - test_browser_camofox_state::test_config_version_matches_current_schema (docstring literally said it would break on unrelated bumps) Relaxed (keep plumbing check, drop snapshot): - Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists: now assert 'provider exists and has >= 1 entry' instead of specific names - HuggingFace main/models.py consistency test: drop 'len >= 6' floor Dynamicized (follow source, not a literal): - 3x test_config.py migration tests: raw['_config_version'] == DEFAULT_CONFIG['_config_version'] instead of hardcoded 21 Fixed stale tests against intentional behavior changes: - test_insights::test_gateway_format_hides_cost: name matches new behavior (no dollar figures); remove contradicting '$' in text assertion - test_config::prefers_api_then_url_then_base_url: flipped per PR #9332; rename + update to base_url > url > api - test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to assert called — contract is 'credential flowed through' - test_interrupt_propagation: add provider/model/_base_url to bare-agent fixture so the stale-timeout code path resolves Fixed stale integration tests against opt-in plugin gate: - transform_tool_result + transform_terminal_output: write plugins.enabled allow-list to config.yaml and reset the plugin manager singleton Source fix (real consistency invariant): - agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length (262144, same as K2.5). test_model_metadata_has_context_lengths was correctly catching the gap. Policy: - AGENTS.md Testing section: new subsection 'Don't write change-detector tests' with do/don't examples. Reviewers should reject catalog-snapshot assertions in new tests. Covers every test that failed on the last completed main CI run (24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present + test_terminal_and_file_toolsets_resolve_all_tools, which now pass both alone and with the full tests/tools/ directory (xdist ordering flake that resolved itself).
This commit is contained in:
parent
7ab5eebd03
commit
62cbeb6367
14 changed files with 113 additions and 80 deletions
49
AGENTS.md
49
AGENTS.md
|
|
@ -566,3 +566,52 @@ python -m pytest tests/ -q -n 4
|
|||
Worker count above 4 will surface test-ordering flakes that CI never sees.
|
||||
|
||||
Always run the full suite before pushing changes.
|
||||
|
||||
### Don't write change-detector tests
|
||||
|
||||
A test is a **change-detector** if it fails whenever data that is **expected
|
||||
to change** gets updated — model catalogs, config version numbers,
|
||||
enumeration counts, hardcoded lists of provider models. These tests add no
|
||||
behavioral coverage; they just guarantee that routine source updates break
|
||||
CI and cost engineering time to "fix."
|
||||
|
||||
**Do not write:**
|
||||
|
||||
```python
|
||||
# catalog snapshot — breaks every model release
|
||||
assert "gemini-2.5-pro" in _PROVIDER_MODELS["gemini"]
|
||||
assert "MiniMax-M2.7" in models
|
||||
|
||||
# config version literal — breaks every schema bump
|
||||
assert DEFAULT_CONFIG["_config_version"] == 21
|
||||
|
||||
# enumeration count — breaks every time a skill/provider is added
|
||||
assert len(_PROVIDER_MODELS["huggingface"]) == 8
|
||||
```
|
||||
|
||||
**Do write:**
|
||||
|
||||
```python
|
||||
# behavior: does the catalog plumbing work at all?
|
||||
assert "gemini" in _PROVIDER_MODELS
|
||||
assert len(_PROVIDER_MODELS["gemini"]) >= 1
|
||||
|
||||
# behavior: does migration bump the user's version to current latest?
|
||||
assert raw["_config_version"] == DEFAULT_CONFIG["_config_version"]
|
||||
|
||||
# invariant: no plan-only model leaks into the legacy list
|
||||
assert not (set(moonshot_models) & coding_plan_only_models)
|
||||
|
||||
# invariant: every model in the catalog has a context-length entry
|
||||
for m in _PROVIDER_MODELS["huggingface"]:
|
||||
assert m.lower() in DEFAULT_CONTEXT_LENGTHS_LOWER
|
||||
```
|
||||
|
||||
The rule: if the test reads like a snapshot of current data, delete it. If
|
||||
it reads like a contract about how two pieces of data must relate, keep it.
|
||||
When a PR adds a new provider/model and you want a test, make the test
|
||||
assert the relationship (e.g. "catalog entries all have context lengths"),
|
||||
not the specific names.
|
||||
|
||||
Reviewers should reject new change-detector tests; authors should convert
|
||||
them into invariants before re-requesting review.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue