mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
Cuts input cost for first-turn Claude requests by ~85-90% on subsequent
sessions within an hour. Tools array (~13k tokens for default toolset) +
stable system prefix (~5-8k tokens) get a 1h cache_control marker; the
volatile suffix (memory, USER profile, timestamp, session id) sits in a
separate non-cached block at the end so it doesn't poison the cross-session
prefix when it changes.
Provider gate: Claude on native Anthropic (incl. OAuth subscription),
OpenRouter, and Nous Portal (which proxies to OpenRouter). All other
providers keep today's system_and_3 layout unchanged.
Layout (4 cache_control breakpoints, Anthropic max):
1. tools[-1] -> 1h (cross-session)
2. system content[0] -> 1h (cross-session, stable prefix)
3. messages[-2] -> 5m (within-session rolling)
4. messages[-1] -> 5m (within-session rolling)
Within-session rolling shrinks from 3 messages to 2 to free the breakpoint
budget. On Claude with realistic tool loadouts the long-lived tier carries
the bulk of cross-session value anyway.
System prompt is now always assembled cache-friendly: stable identity /
guidance / skills / platform hints first, then session-stable context
files (AGENTS.md, .cursorrules), then per-call volatile content. Old
single-string callers see the same logical content (same join order),
just reordered so volatile lives at the end.
Config knobs (defaults shown):
prompt_caching:
cache_ttl: "5m" # rolling-window TTL (unchanged)
long_lived_prefix: true # opt-out switch
long_lived_ttl: "1h" # cross-session prefix TTL
Live E2E (tests/agent/test_prompt_caching_live.py, gated on
OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset:
Call 1 (cold): cache_write=13,415 cache_read=0
Call 2 (NEW agent + msg): cache_write=391 cache_read=13,025
Cross-session reuse: 97.09%
Implementation:
* agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived()
+ mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control()
preserved verbatim for the fallback path.
* agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards
cache_control onto each Anthropic-format tool dict.
* run_agent.py: _build_system_prompt_parts() returns the 3-tier dict;
_build_system_prompt() joins them (backward compatible).
_supports_long_lived_anthropic_cache() policy added next to the existing
_anthropic_prompt_cache_policy() (which now also recognises Nous Portal
Claude — pre-existing gap fixed in passing).
_build_api_kwargs() resolves tools_for_api once and propagates the
marker through all four build paths (anthropic_messages, bedrock,
codex_responses, profile/legacy chat completions).
Long-lived flag plumbed into the runtime snapshot/restore + model-switch
+ fallback-promotion paths.
Tests:
* tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache,
TestApplyAnthropicCacheControlLongLived).
* tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests
(TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes
+ a fallback-target case).
* tests/agent/test_prompt_caching_live.py: new live E2E (skipif when
OPENROUTER_API_KEY is unset; runs outside the hermetic suite).
* Targeted suites: 327/327 pass (caching/adapter/policy/builder).
* tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing
flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry,
verified failing on pristine origin/main).
|
||
|---|---|---|
| .. | ||
| transports | ||
| __init__.py | ||
| test_anthropic_adapter.py | ||
| test_anthropic_keychain.py | ||
| test_arcee_trinity_overrides.py | ||
| test_auxiliary_client.py | ||
| test_auxiliary_client_anthropic_custom.py | ||
| test_auxiliary_config_bridge.py | ||
| test_auxiliary_main_first.py | ||
| test_auxiliary_named_custom_providers.py | ||
| test_auxiliary_transport_autodetect.py | ||
| test_bedrock_1m_context.py | ||
| test_bedrock_adapter.py | ||
| test_bedrock_integration.py | ||
| test_codex_cloudflare_headers.py | ||
| test_compress_focus.py | ||
| test_compressor_image_tokens.py | ||
| test_context_compressor.py | ||
| test_context_compressor_summary_continuity.py | ||
| test_context_engine.py | ||
| test_context_references.py | ||
| test_copilot_acp_client.py | ||
| test_credential_pool.py | ||
| test_credential_pool_routing.py | ||
| test_crossloop_client_cache.py | ||
| test_curator.py | ||
| test_curator_activity.py | ||
| test_curator_backup.py | ||
| test_curator_classification.py | ||
| test_curator_reports.py | ||
| test_deepseek_anthropic_thinking.py | ||
| test_direct_provider_url_detection.py | ||
| test_display.py | ||
| test_display_emoji.py | ||
| test_error_classifier.py | ||
| test_external_skills.py | ||
| test_external_skills_dirs_cache.py | ||
| test_gemini_cloudcode.py | ||
| test_gemini_fast_fallback.py | ||
| test_gemini_free_tier_gate.py | ||
| test_gemini_native_adapter.py | ||
| test_gemini_schema.py | ||
| test_i18n.py | ||
| test_image_gen_registry.py | ||
| test_image_routing.py | ||
| test_insights.py | ||
| test_kimi_coding_anthropic_thinking.py | ||
| test_local_stream_timeout.py | ||
| test_markdown_tables.py | ||
| test_memory_provider.py | ||
| test_memory_session_switch.py | ||
| test_memory_user_id.py | ||
| test_minimax_auxiliary_url.py | ||
| test_minimax_provider.py | ||
| test_model_metadata.py | ||
| test_model_metadata_local_ctx.py | ||
| test_model_metadata_ssl.py | ||
| test_models_dev.py | ||
| test_moonshot_schema.py | ||
| test_nous_rate_guard.py | ||
| test_onboarding.py | ||
| test_openrouter_response_cache.py | ||
| test_plugin_llm.py | ||
| test_prompt_builder.py | ||
| test_prompt_caching.py | ||
| test_prompt_caching_live.py | ||
| test_proxy_and_url_validation.py | ||
| test_rate_limit_tracker.py | ||
| test_redact.py | ||
| test_shell_hooks.py | ||
| test_shell_hooks_consent.py | ||
| test_skill_commands.py | ||
| test_skill_commands_reload.py | ||
| test_skill_utils.py | ||
| test_streaming_context_scrubber.py | ||
| test_subagent_progress.py | ||
| test_subagent_stop_hook.py | ||
| test_subdirectory_hints.py | ||
| test_think_scrubber.py | ||
| test_title_generator.py | ||
| test_tool_guardrails.py | ||
| test_unsupported_parameter_retry.py | ||
| test_unsupported_temperature_retry.py | ||
| test_usage_pricing.py | ||
| test_vision_resolved_args.py | ||