fix(agent): trigger preflight compression on few-but-huge sessions (#27405)

The preflight-compression gate only ran the (expensive) token estimate when
the message COUNT exceeded protect_first_n + protect_last_n + 1. A session
with a handful of very large messages never tripped the count condition, so
compression was never attempted and the turn eventually hit a hard
context-overflow error.

Add _should_run_preflight_estimate() with OR semantics: run the estimate when
either the message count exceeds the protected ranges (the historical gate)
OR a cheap char-based estimate already crosses the configured threshold. The
downstream estimate_request_tokens_rough() stays authoritative — this is only
a hint that decides whether to pay for the full estimate.

Salvaged from #27435 by @texhy (authorship preserved). Re-applied on current
main: the preflight gate moved from conversation_loop.py to turn_context.py
since the PR was opened, so the helper + gate are placed there; the test
imports the real MINIMUM_CONTEXT_LENGTH instead of a hardcoded literal.

Closes #27405.
This commit is contained in:
texhy 2026-06-25 01:10:13 +05:30 committed by kshitijk4poor
parent b13e2fd694
commit aacc6bb0a8
3 changed files with 150 additions and 5 deletions

View file

@ -64,6 +64,7 @@ AUTHOR_MAP = {
"rayjun0412@gmail.com": "rayjun", # cron model.default salvage co-author (#43952)
"96944678+sweetcornna@users.noreply.github.com": "sweetcornna", # cron ticker-liveness salvage co-author (#33849)
"izumi0uu@gmail.com": "izumi0uu", # PR #49544 salvage (native rich reply echo; #49534)
"dev@pixlmedia.no": "texhy", # PR #27435 salvage (few-but-huge preflight compression gate; #27405)
"w31rdm4ch1n3z@protonmail.com": "w31rdm4ch1nZ",
"xtpeeps@gmail.com": "x7peeps",
"ahmad@madsgency.com": "ahmadashfq",