docs: clarify compression threshold is derived from the main model's context window (#35099)

The compression threshold is threshold × context_length where context_length is the MAIN agent model's window, not the auxiliary/summary model's. On a 262,144-token model at the default 0.50 the threshold is 131,072 — close to a common 128K figure by coincidence of the percentage, which has led to confusion that the auxiliary model's context limit is the trigger. Add a note preempting that misreading and pointing to the separate summary-model-context constraint.
2026-07-27 17:58:07 +00:00 · 2026-05-29 19:59:04 -07:00 · 2026-05-29 19:59:04 -07:00 · 860cf28dab
commit 860cf28dab
parent fb0ab27649
1 changed files with 11 additions and 0 deletions
--- a/website/docs/developer-guide/context-compression-and-caching.md
+++ b/website/docs/developer-guide/context-compression-and-caching.md
@ -111,6 +111,17 @@ tail_token_budget    = 100,000 × 0.20 = 20,000
 max_summary_tokens   = min(200,000 × 0.05, 12,000) = 10,000
 ```

+:::note Threshold is derived from the MAIN model's context window
+`threshold_tokens` is always `threshold × context_length`, where `context_length`
+is the **main agent model's** context window — never the auxiliary/summary
+model's. On a 262,144-token model at the default `0.50`, the threshold is
+`262,144 × 0.50 = 131,072`. That number being close to a common "128K context"
+is a coincidence of the percentage, not a sign that the auxiliary model's window
+is the trigger. The auxiliary model's context window is a separate concern — see
+the "Summary model context length" warning below for how it affects whether a
+summary can be produced, not when compression fires.
+:::
+

 ## Compression Algorithm