hermes-agent/tests
Teknium 7b76366552
feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828)
Cuts input cost for first-turn Claude requests by ~85-90% on subsequent
sessions within an hour. Tools array (~13k tokens for default toolset) +
stable system prefix (~5-8k tokens) get a 1h cache_control marker; the
volatile suffix (memory, USER profile, timestamp, session id) sits in a
separate non-cached block at the end so it doesn't poison the cross-session
prefix when it changes.

Provider gate: Claude on native Anthropic (incl. OAuth subscription),
OpenRouter, and Nous Portal (which proxies to OpenRouter). All other
providers keep today's system_and_3 layout unchanged.

Layout (4 cache_control breakpoints, Anthropic max):
  1. tools[-1]              -> 1h (cross-session)
  2. system content[0]      -> 1h (cross-session, stable prefix)
  3. messages[-2]           -> 5m (within-session rolling)
  4. messages[-1]           -> 5m (within-session rolling)

Within-session rolling shrinks from 3 messages to 2 to free the breakpoint
budget. On Claude with realistic tool loadouts the long-lived tier carries
the bulk of cross-session value anyway.

System prompt is now always assembled cache-friendly: stable identity /
guidance / skills / platform hints first, then session-stable context
files (AGENTS.md, .cursorrules), then per-call volatile content. Old
single-string callers see the same logical content (same join order),
just reordered so volatile lives at the end.

Config knobs (defaults shown):
  prompt_caching:
    cache_ttl: "5m"           # rolling-window TTL (unchanged)
    long_lived_prefix: true    # opt-out switch
    long_lived_ttl: "1h"       # cross-session prefix TTL

Live E2E (tests/agent/test_prompt_caching_live.py, gated on
OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset:
  Call 1 (cold):              cache_write=13,415  cache_read=0
  Call 2 (NEW agent + msg):   cache_write=391     cache_read=13,025
  Cross-session reuse:        97.09%

Implementation:
* agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived()
  + mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control()
  preserved verbatim for the fallback path.
* agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards
  cache_control onto each Anthropic-format tool dict.
* run_agent.py: _build_system_prompt_parts() returns the 3-tier dict;
  _build_system_prompt() joins them (backward compatible).
  _supports_long_lived_anthropic_cache() policy added next to the existing
  _anthropic_prompt_cache_policy() (which now also recognises Nous Portal
  Claude — pre-existing gap fixed in passing).
  _build_api_kwargs() resolves tools_for_api once and propagates the
  marker through all four build paths (anthropic_messages, bedrock,
  codex_responses, profile/legacy chat completions).
  Long-lived flag plumbed into the runtime snapshot/restore + model-switch
  + fallback-promotion paths.

Tests:
* tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache,
  TestApplyAnthropicCacheControlLongLived).
* tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests
  (TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes
  + a fallback-target case).
* tests/agent/test_prompt_caching_live.py: new live E2E (skipif when
  OPENROUTER_API_KEY is unset; runs outside the hermetic suite).
* Targeted suites: 327/327 pass (caching/adapter/policy/builder).
* tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing
  flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry,
  verified failing on pristine origin/main).
2026-05-11 11:14:56 -07:00
..
acp fix(acp): preserve assistant reasoning metadata in session persistence 2026-05-05 10:18:28 -07:00
acp_adapter fix: make session search initialize session db 2026-05-09 14:36:58 -07:00
agent feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828) 2026-05-11 11:14:56 -07:00
cli fix(cli,tui): align CJK / wide-char markdown tables (#23863) 2026-05-11 11:13:06 -07:00
cron fix(cron): avoid github skill false positives in scanner 2026-05-09 11:11:45 -07:00
e2e fix(gateway): move quick-command dispatch before built-in handlers 2026-05-04 01:39:23 -07:00
environments/benchmarks
fakes
gateway revert: roll back /goal checklist + /subgoal feature stack (#23813) 2026-05-11 07:06:27 -07:00
hermes_cli fix(/model): surface Nous Portal models from remote catalog manifest (#23912) 2026-05-11 10:15:30 -07:00
hermes_state
honcho_plugin feat(honcho): explain why when honcho_profile returns an empty card 2026-04-27 12:37:33 -07:00
integration
openviking_plugin fix(openviking): pre-check fs/stat to route file URIs before hitting directory-only endpoints 2026-04-30 02:35:29 -07:00
plugins test(kanban): remove stale t.summary assertion from search test 2026-05-10 21:44:37 -07:00
providers feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838) 2026-05-09 14:47:00 -07:00
run_agent feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828) 2026-05-11 11:14:56 -07:00
skills Add unit tests for hyperliquid skill functionality 2026-05-10 22:15:04 -07:00
stress fix(kanban): gate claim + unblock on parent completion 2026-05-09 11:07:37 -07:00
tools fix(approval): catch sudo with stdin/askpass/shell privilege flags 2026-05-11 06:56:30 -07:00
tui_gateway fix(tui): close slash parity gaps with CLI (#20339) 2026-05-05 15:42:39 -05:00
website docs(skills): explain restoring bundled skills 2026-05-05 13:46:20 -07:00
__init__.py
conftest.py test(conftest): plug every gateway-kill leak path (#23486) 2026-05-10 18:55:28 -07:00
run_interrupt_test.py
test_account_usage.py
test_atomic_replace_symlinks.py refactor: consolidate symlink-safe atomic replace into shared helper 2026-04-28 04:58:22 -07:00
test_base_url_hostname.py
test_batch_runner_checkpoint.py test: regression coverage for checkpoint dedup and inf/nan coercion 2026-04-24 14:32:21 -07:00
test_cli_file_drop.py
test_cli_manual_compress.py test(cli): regression test for manual /compress system_message 2026-04-28 05:21:49 -07:00
test_cli_skin_integration.py fix(ci): stabilize main test suite regressions (#17660) 2026-04-29 23:18:55 -07:00
test_ctx_halving_fix.py
test_empty_model_fallback.py
test_evidence_store.py
test_get_tool_definitions_cache_isolation.py fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335) 2026-04-30 04:32:06 -07:00
test_hermes_bootstrap.py fix(entry-points): guard hermes_bootstrap import so partial updates don't brick hermes (#22091) 2026-05-08 14:43:13 -07:00
test_hermes_constants.py test(hermes_constants): cover parse_reasoning_effort() 2026-05-07 09:59:07 -07:00
test_hermes_home_profile_warning.py fix(constants): warn once when get_hermes_home() falls back under an active profile (#18746) 2026-05-02 01:49:55 -07:00
test_hermes_logging.py fix(logging): attach gateway log after cli init 2026-04-26 19:01:26 -07:00
test_hermes_state.py fix(session): route OR-combined short CJK tokens to LIKE fallback (#20494) 2026-05-09 17:53:02 -07:00
test_hermes_state_wal_fallback.py fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043) 2026-05-09 02:09:35 -07:00
test_honcho_client_config.py
test_install_sh_pythonpath_sanitization.py fix: harden install.sh against inherited Python env leakage 2026-05-06 04:02:02 -07:00
test_install_sh_setup_wizard_tty_probe.py fix(install): widen /dev/tty open-probe to sibling gates (#16746) 2026-04-28 06:45:55 -07:00
test_install_sh_termux_network_prereqs.py fix: strengthen termux install network prerequisites 2026-05-07 13:04:08 -07:00
test_ipv4_preference.py
test_lazy_session_regressions.py fix: resolve lazy session creation regressions (#18370 fallout) (#20363) 2026-05-06 01:11:49 +05:30
test_lint_config.py lint: enable PLW1514 as a blocking ruff rule 2026-05-08 14:27:40 -07:00
test_live_system_guard_self_test.py test(conftest): plug every gateway-kill leak path (#23486) 2026-05-10 18:55:28 -07:00
test_mcp_serve.py fix(mcp): unwrap platforms key in channels_list 2026-05-07 13:41:16 -07:00
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py test(cli): cover minimax-oauth resolution, refresh, menu wiring 2026-04-29 09:53:42 -07:00
test_minisweagent_path.py
test_model_picker_scroll.py
test_model_tools.py fix(plugins): stop firing pre_tool_call hook twice per tool execution (#17611) 2026-04-29 12:43:39 -07:00
test_model_tools_async_bridge.py fix(model_tools): cancel coroutine on timeout so worker thread exits + log full traceback 2026-04-29 05:00:40 -07:00
test_ollama_num_ctx.py
test_packaging_metadata.py
test_plugin_skills.py fix(skills): support category-qualified local skill names 2026-05-05 10:15:31 -07:00
test_process_loop_event_loop_warning.py fix(cli): replace get_event_loop() with get_running_loop() to silence RuntimeWarning in process_loop thread (#19285) 2026-05-07 06:35:54 -07:00
test_project_metadata.py
test_retry_utils.py
test_sql_injection.py
test_subprocess_home_isolation.py
test_termux_all_extra_compat.py fix: add termux-all install profile and safe fallbacks 2026-05-07 13:04:08 -07:00
test_timezone.py
test_toolset_distributions.py
test_toolsets.py fix: merge plugin tools into builtin toolsets 2026-05-05 10:14:17 -07:00
test_trajectory_compressor.py
test_trajectory_compressor_async.py
test_transform_llm_output_hook.py test+docs: cover transform_llm_output hook + release author map 2026-05-07 05:46:05 -07:00
test_transform_tool_result_hook.py
test_tui_gateway_server.py Merge pull request #20942 from NousResearch/austin/fix/personality 2026-05-07 18:54:29 -04:00
test_utils_truthy_values.py
test_yuanbao_integration.py yuanbao platform (#16298) 2026-04-26 18:50:49 -07:00
test_yuanbao_markdown.py yuanbao platform (#16298) 2026-04-26 18:50:49 -07:00
test_yuanbao_pipeline.py yuanbao platform (#16298) 2026-04-26 18:50:49 -07:00
test_yuanbao_proto.py yuanbao platform (#16298) 2026-04-26 18:50:49 -07:00