perf(compression): defer feasibility check to first compression attempt (#28957)

`AIAgent.__init__` was eagerly calling `_check_compression_model_feasibility()` which probes the auxiliary provider chain and runs `get_model_context_length()` (potentially network-bound) to decide whether the configured auxiliary model can fit a full compression-threshold window. That cost ~440ms cold on every agent construction. Most `chat -q` invocations finish in 1-5 seconds and never accumulate enough context to trip the compression threshold, so the feasibility check is pure overhead. The result is also only consumed when compression actually fires (the function adjusts the live threshold downward if the aux model can't fit; absent that mutation, the gate in `conversation_loop.py:442` would never fire anyway). Defer to first `compress_context()` call via `agent._compression_feasibility_checked` sentinel. Runs at most once per agent lifetime, just before the first compression pass. The warning storage (`_compression_warning`) and gateway replay machinery is unchanged — it still emits to status_callback on the first turn that actually needs compression. E2E timing (chat -q 'hi', 3 runs each): BEFORE AFTER delta median wall 2.03s 1.86s -8% (-169ms) min wall 1.92s 1.63s -15% (-293ms) Real cold-start observation (synthetic 31-turn agent loop): identical behavior since feasibility check fires once on first compression and caches. No semantic difference for sessions that DO compress. UX trade-off: users with broken auxiliary-provider config no longer see the warning at session start. They see it when compression first fires — which is exactly when it matters. For users with working config (the vast majority), the warning never fires anyway, so the deferral is invisible. Tests: - tests/run_agent/test_compression_feasibility.py — 16/16 pass (the one test that asserted call-at-init was updated to drive the lazy check explicitly via agent._check_compression_model_feasibility()) - Live tmux session: 2-turn conversation + tool call completes clean, zero errors in agent.log
2026-05-29 06:31:32 +00:00 · 2026-05-19 17:27:17 -07:00 · 2026-05-19 17:27:17 -07:00 · 6cb9917c73
commit 6cb9917c73
parent 93734c26e5
3 changed files with 37 additions and 3 deletions
--- a/agent/agent_init.py
+++ b/agent/agent_init.py
@ -1466,7 +1466,13 @@ def init_agent(
    # Gateway status_callback is not yet wired, so any warning is stored
    # in _compression_warning and replayed in the first run_conversation().
    agent._compression_warning = None
-    agent._check_compression_model_feasibility()
+    # Lazy feasibility check: deferred to the first turn that approaches the
+    # compression threshold. Running it eagerly here costs ~400ms cold (network
+    # probe of the auxiliary provider chain + /models lookup) on every agent
+    # init, including short ``chat -q`` runs that never reach the threshold.
+    # ``ensure_compression_feasibility_checked`` (called from
+    # ``run_conversation``'s preflight) runs it at most once per agent.
+    agent._compression_feasibility_checked = False

    # Snapshot primary runtime for per-turn restoration.  When fallback
    # activates during a turn, the next turn restores these values so the