feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828)

Cuts input cost for first-turn Claude requests by ~85-90% on subsequent sessions within an hour. Tools array (~13k tokens for default toolset) + stable system prefix (~5-8k tokens) get a 1h cache_control marker; the volatile suffix (memory, USER profile, timestamp, session id) sits in a separate non-cached block at the end so it doesn't poison the cross-session prefix when it changes. Provider gate: Claude on native Anthropic (incl. OAuth subscription), OpenRouter, and Nous Portal (which proxies to OpenRouter). All other providers keep today's system_and_3 layout unchanged. Layout (4 cache_control breakpoints, Anthropic max): 1. tools[-1] -> 1h (cross-session) 2. system content[0] -> 1h (cross-session, stable prefix) 3. messages[-2] -> 5m (within-session rolling) 4. messages[-1] -> 5m (within-session rolling) Within-session rolling shrinks from 3 messages to 2 to free the breakpoint budget. On Claude with realistic tool loadouts the long-lived tier carries the bulk of cross-session value anyway. System prompt is now always assembled cache-friendly: stable identity / guidance / skills / platform hints first, then session-stable context files (AGENTS.md, .cursorrules), then per-call volatile content. Old single-string callers see the same logical content (same join order), just reordered so volatile lives at the end. Config knobs (defaults shown): prompt_caching: cache_ttl: "5m" # rolling-window TTL (unchanged) long_lived_prefix: true # opt-out switch long_lived_ttl: "1h" # cross-session prefix TTL Live E2E (tests/agent/test_prompt_caching_live.py, gated on OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset: Call 1 (cold): cache_write=13,415 cache_read=0 Call 2 (NEW agent + msg): cache_write=391 cache_read=13,025 Cross-session reuse: 97.09% Implementation: * agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived() + mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control() preserved verbatim for the fallback path. * agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards cache_control onto each Anthropic-format tool dict. * run_agent.py: _build_system_prompt_parts() returns the 3-tier dict; _build_system_prompt() joins them (backward compatible). _supports_long_lived_anthropic_cache() policy added next to the existing _anthropic_prompt_cache_policy() (which now also recognises Nous Portal Claude — pre-existing gap fixed in passing). _build_api_kwargs() resolves tools_for_api once and propagates the marker through all four build paths (anthropic_messages, bedrock, codex_responses, profile/legacy chat completions). Long-lived flag plumbed into the runtime snapshot/restore + model-switch + fallback-promotion paths. Tests: * tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache, TestApplyAnthropicCacheControlLongLived). * tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests (TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes + a fallback-target case). * tests/agent/test_prompt_caching_live.py: new live E2E (skipif when OPENROUTER_API_KEY is unset; runs outside the hermetic suite). * Targeted suites: 327/327 pass (caching/adapter/policy/builder). * tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry, verified failing on pristine origin/main).
2026-05-21 05:11:26 +00:00 · 2026-05-11 11:14:56 -07:00 · 2026-05-11 11:14:56 -07:00 · 7b76366552
commit 7b76366552
parent 2ec8d2b42f
7 changed files with 793 additions and 112 deletions
--- a/tests/run_agent/test_anthropic_prompt_cache_policy.py
+++ b/tests/run_agent/test_anthropic_prompt_cache_policy.py
@ -290,3 +290,102 @@ class TestExplicitOverrides:
            model="anthropic/claude-sonnet-4.6",
        )
        assert (should, native) == (True, False)
+
+
+# ─────────────────────────────────────────────────────────────────────
+# Long-lived prefix cache policy (cross-session 1h tier)
+# ─────────────────────────────────────────────────────────────────────
+
+class TestSupportsLongLivedAnthropicCache:
+    """Narrower than _anthropic_prompt_cache_policy — only Claude on the 4
+    explicitly-validated endpoints get the long-lived layout."""
+
+    def test_native_anthropic_claude_supported(self):
+        agent = _make_agent(
+            provider="anthropic",
+            base_url="https://api.anthropic.com",
+            api_mode="anthropic_messages",
+            model="claude-sonnet-4.6",
+        )
+        assert agent._supports_long_lived_anthropic_cache() is True
+
+    def test_anthropic_oauth_supported(self):
+        # OAuth uses the same transport as native Anthropic
+        agent = _make_agent(
+            provider="anthropic",
+            base_url="https://api.anthropic.com",
+            api_mode="anthropic_messages",
+            model="claude-opus-4.6",
+        )
+        assert agent._supports_long_lived_anthropic_cache() is True
+
+    def test_openrouter_claude_supported(self):
+        agent = _make_agent(
+            provider="openrouter",
+            base_url="https://openrouter.ai/api/v1",
+            api_mode="chat_completions",
+            model="anthropic/claude-sonnet-4.6",
+        )
+        assert agent._supports_long_lived_anthropic_cache() is True
+
+    def test_nous_portal_claude_supported(self):
+        # Nous Portal proxies to OpenRouter — same wire format
+        agent = _make_agent(
+            provider="nous",
+            base_url="https://inference-api.nousresearch.com/v1",
+            api_mode="chat_completions",
+            model="anthropic/claude-opus-4.7",
+        )
+        assert agent._supports_long_lived_anthropic_cache() is True
+
+    def test_openrouter_non_claude_rejected(self):
+        agent = _make_agent(
+            provider="openrouter",
+            base_url="https://openrouter.ai/api/v1",
+            api_mode="chat_completions",
+            model="openai/gpt-5.4",
+        )
+        assert agent._supports_long_lived_anthropic_cache() is False
+
+    def test_third_party_anthropic_gateway_rejected(self):
+        # MiniMax / Kimi / etc. — anthropic-wire but not in our validated list
+        agent = _make_agent(
+            provider="minimax",
+            base_url="https://api.minimax.io/anthropic",
+            api_mode="anthropic_messages",
+            model="minimax-m2.7",
+        )
+        assert agent._supports_long_lived_anthropic_cache() is False
+
+    def test_alibaba_dashscope_rejected(self):
+        agent = _make_agent(
+            provider="alibaba",
+            base_url="https://dashscope.aliyuncs.com/api/v1/anthropic",
+            api_mode="anthropic_messages",
+            model="qwen3.5-plus",
+        )
+        assert agent._supports_long_lived_anthropic_cache() is False
+
+    def test_opencode_qwen_rejected(self):
+        agent = _make_agent(
+            provider="opencode-go",
+            base_url="https://api.opencode-go.example/v1",
+            api_mode="chat_completions",
+            model="qwen3.6-plus",
+        )
+        assert agent._supports_long_lived_anthropic_cache() is False
+
+    def test_fallback_target_evaluated_independently(self):
+        # Starting on a non-supported provider, falling back to OpenRouter Claude
+        agent = _make_agent(
+            provider="minimax",
+            base_url="https://api.minimax.io/anthropic",
+            api_mode="anthropic_messages",
+            model="minimax-m2.7",
+        )
+        assert agent._supports_long_lived_anthropic_cache(
+            provider="openrouter",
+            base_url="https://openrouter.ai/api/v1",
+            api_mode="chat_completions",
+            model="anthropic/claude-sonnet-4.6",
+        ) is True