docs: add ACP and internal systems implementation guides

- add ACP user and developer docs covering setup, lifecycle, callbacks, permissions, tool rendering, and runtime behavior - add developer guides for agent loop, provider runtime resolution, prompt assembly, context caching/compression, gateway internals, session storage, tools runtime, trajectories, and cron internals - refresh architecture, quickstart, installation, CLI reference, and environments docs to link the new implementation pages and ACP support
2026-04-25 00:51:20 +00:00 · 2026-03-14 00:29:48 -07:00 · 2026-03-14 00:29:48 -07:00 · d87a1615ce
commit d87a1615ce
parent 29176f302e
17 changed files with 1256 additions and 170 deletions
--- a/website/docs/developer-guide/context-compression-and-caching.md
+++ b/website/docs/developer-guide/context-compression-and-caching.md
@ -0,0 +1,72 @@
+---
+sidebar_position: 6
+title: "Context Compression & Prompt Caching"
+description: "How Hermes compresses long conversations and applies provider-side prompt caching"
+---
+
+# Context Compression & Prompt Caching
+
+Hermes manages long conversations with two complementary mechanisms:
+
+- prompt caching
+- context compression
+
+Primary files:
+
+- `agent/prompt_caching.py`
+- `agent/context_compressor.py`
+- `run_agent.py`
+
+## Prompt caching
+
+For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers.
+
+Current strategy:
+
+- cache the system prompt
+- cache the last 3 non-system messages
+- default TTL is 5 minutes unless explicitly extended
+
+This is implemented in `agent/prompt_caching.py`.
+
+## Why prompt stability matters
+
+Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to.
+
+## Compression trigger
+
+Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts.
+
+## Compression algorithm
+
+The compressor protects:
+
+- the first N turns
+- the last N turns
+
+and summarizes the middle section.
+
+It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression.
+
+## Pre-compression memory flush
+
+Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away.
+
+## Session lineage after compression
+
+Compression can split the session into a new session ID while preserving parent lineage in the state DB.
+
+This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain.
+
+## Re-injected state after compression
+
+After compression, Hermes may re-inject compact operational state such as:
+
+- todo snapshot
+- prior-read-files summary
+
+## Related docs
+
+- [Prompt Assembly](./prompt-assembly.md)
+- [Session Storage](./session-storage.md)
+- [Agent Loop Internals](./agent-loop.md)