test: stop testing mutable data — convert change-detectors to invariants (#13363)

Catalog snapshots, config version literals, and enumeration counts are data that changes as designed. Tests that assert on those values add no behavioral coverage — they just break CI on every routine update and cost engineering time to 'fix.' Replace with invariants where one exists, delete where none does. Deleted (pure snapshots): - TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al - TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models' - test_browser_camofox_state::test_config_version_matches_current_schema (docstring literally said it would break on unrelated bumps) Relaxed (keep plumbing check, drop snapshot): - Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists: now assert 'provider exists and has >= 1 entry' instead of specific names - HuggingFace main/models.py consistency test: drop 'len >= 6' floor Dynamicized (follow source, not a literal): - 3x test_config.py migration tests: raw['_config_version'] == DEFAULT_CONFIG['_config_version'] instead of hardcoded 21 Fixed stale tests against intentional behavior changes: - test_insights::test_gateway_format_hides_cost: name matches new behavior (no dollar figures); remove contradicting '$' in text assertion - test_config::prefers_api_then_url_then_base_url: flipped per PR #9332; rename + update to base_url > url > api - test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to assert called — contract is 'credential flowed through' - test_interrupt_propagation: add provider/model/_base_url to bare-agent fixture so the stale-timeout code path resolves Fixed stale integration tests against opt-in plugin gate: - transform_tool_result + transform_terminal_output: write plugins.enabled allow-list to config.yaml and reset the plugin manager singleton Source fix (real consistency invariant): - agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length (262144, same as K2.5). test_model_metadata_has_context_lengths was correctly catching the gap. Policy: - AGENTS.md Testing section: new subsection 'Don't write change-detector tests' with do/don't examples. Reviewers should reject catalog-snapshot assertions in new tests. Covers every test that failed on the last completed main CI run (24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present + test_terminal_and_file_toolsets_resolve_all_tools, which now pass both alone and with the full tests/tools/ directory (xdist ordering flake that resolved itself).
2026-04-29 01:31:41 +00:00 · 2026-04-20 23:20:33 -07:00 · 2026-04-20 23:20:33 -07:00 · 62cbeb6367
commit 62cbeb6367
parent 7ab5eebd03
14 changed files with 113 additions and 80 deletions
--- a/tests/hermes_cli/test_api_key_providers.py
+++ b/tests/hermes_cli/test_api_key_providers.py
@ -921,17 +921,13 @@ class TestKimiMoonshotModelListIsolation:
        leaked = set(moonshot_models) & coding_plan_only
        assert not leaked, f"Moonshot list contains Coding Plan-only models: {leaked}"

-    def test_moonshot_list_contains_shared_models(self):
+    def test_moonshot_list_non_empty(self):
        from hermes_cli.main import _PROVIDER_MODELS
-        moonshot_models = _PROVIDER_MODELS["moonshot"]
-        assert "kimi-k2.5" in moonshot_models
-        assert "kimi-k2-thinking" in moonshot_models
+        assert len(_PROVIDER_MODELS["moonshot"]) >= 1

-    def test_coding_plan_list_contains_plan_specific_models(self):
+    def test_coding_plan_list_non_empty(self):
        from hermes_cli.main import _PROVIDER_MODELS
-        coding_models = _PROVIDER_MODELS["kimi-coding"]
-        assert "kimi-for-coding" in coding_models
-        assert "kimi-k2-thinking-turbo" in coding_models
+        assert len(_PROVIDER_MODELS["kimi-coding"]) >= 1


 # =============================================================================
@ -944,14 +940,12 @@ class TestHuggingFaceModels:
    def test_main_provider_models_has_huggingface(self):
        from hermes_cli.main import _PROVIDER_MODELS
        assert "huggingface" in _PROVIDER_MODELS
-        models = _PROVIDER_MODELS["huggingface"]
-        assert len(models) >= 6, "Expected at least 6 curated HF models"
+        assert len(_PROVIDER_MODELS["huggingface"]) >= 1

    def test_models_py_has_huggingface(self):
        from hermes_cli.models import _PROVIDER_MODELS
        assert "huggingface" in _PROVIDER_MODELS
-        models = _PROVIDER_MODELS["huggingface"]
-        assert len(models) >= 6
+        assert len(_PROVIDER_MODELS["huggingface"]) >= 1

    def test_model_lists_match(self):
        """Model lists in main.py and models.py should be identical."""