docs: clarify compression threshold is derived from the main model's context window (#35099)

The compression threshold is threshold × context_length where context_length
is the MAIN agent model's window, not the auxiliary/summary model's. On a
262,144-token model at the default 0.50 the threshold is 131,072 — close to a
common 128K figure by coincidence of the percentage, which has led to confusion
that the auxiliary model's context limit is the trigger. Add a note preempting
that misreading and pointing to the separate summary-model-context constraint.
This commit is contained in:
Teknium 2026-05-29 19:59:04 -07:00 committed by GitHub
parent fb0ab27649
commit 860cf28dab
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -111,6 +111,17 @@ tail_token_budget = 100,000 × 0.20 = 20,000
max_summary_tokens = min(200,000 × 0.05, 12,000) = 10,000
```
:::note Threshold is derived from the MAIN model's context window
`threshold_tokens` is always `threshold × context_length`, where `context_length`
is the **main agent model's** context window — never the auxiliary/summary
model's. On a 262,144-token model at the default `0.50`, the threshold is
`262,144 × 0.50 = 131,072`. That number being close to a common "128K context"
is a coincidence of the percentage, not a sign that the auxiliary model's window
is the trigger. The auxiliary model's context window is a separate concern — see
the "Summary model context length" warning below for how it affects whether a
summary can be produced, not when compression fires.
:::
## Compression Algorithm