Commit graph

7 commits

Author SHA1 Message Date
Teknium
f92006ce1c
fix(compression): reserve system+tools headroom when aux binds threshold (#15631)
When the auxiliary compression model's context is smaller than the main
model's compression threshold, _check_compression_model_feasibility
auto-lowers the session threshold. Previously it set:

    new_threshold = aux_context

This let the raw message list grow to exactly aux_context tokens. But
compression and flush_memories actually send system_prompt + tool_schemas
+ messages to the aux model. With 50+ tools that overhead is 25-30K
tokens, so the full request overflowed aux with HTTP 400.

Subtract a headroom estimate from aux_context before setting the new
threshold: the actual tool-schema token count (from
estimate_request_tokens_rough) plus a 12K allowance for the system
prompt (not yet built at __init__ time) and flush-instruction overhead.
Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with
an unusually heavy tool schema.

This fixes the 'flush_memories overflow on busy toolsets' path that
Teknium flagged — where main and aux can be nominally the same model
but still 400 because the threshold left no room for the request
overhead. Same fix also protects the normal compression summarisation
request on the same binding aux.

Tests: two new regression tests cover the headroom reservation and the
MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new
(lower) threshold values now that empty-tools still produces a 12K
static headroom deduction.
2026-04-25 05:41:56 -07:00
Teknium
4f24db4258
fix(compression): enforce 64k floor on aux model + auto-correct threshold (#12898)
Context compression silently failed when the auxiliary compression model's
context window was smaller than the main model's compression threshold
(e.g. GLM-4.5-air at 131k paired with a 150k threshold).  The feasibility
check warned but the session kept running and compression attempts errored
out mid-conversation.

Two changes in _check_compression_model_feasibility():

1. Hard floor: if detected aux context < MINIMUM_CONTEXT_LENGTH (64k),
   raise ValueError so the session refuses to start.  Mirrors the existing
   main-model rejection at AIAgent.__init__ line 1600.  A compression model
   below 64k cannot summarise a full threshold-sized window.

2. Auto-correct: when aux context is >= 64k but below the computed
   threshold, lower the live compressor's threshold_tokens to aux_context
   (and update threshold_percent to match so later update_model() calls
   stay in sync).  Warning reworded to say what was done and how to
   persist the fix in config.yaml.

Only ValueError re-raises; other exceptions in the check remain swallowed
as non-fatal.
2026-04-20 00:56:04 -07:00
kshitijk4poor
7bd1a3a4b1 test(compression): cover real init feasibility override 2026-04-19 10:40:26 -07:00
kshitijk4poor
045b28733e fix(compression): resolve missing config attribute in feasibility check
Commit 4a9c3565 added a reference to `self.config` in
`_check_compression_model_feasibility()` to pass the user-configured
`auxiliary.compression.context_length` to `get_model_context_length()`.
However, `AIAgent` never stores the loaded config dict as an instance
attribute — the config is loaded into a local variable `_agent_cfg` in
`__init__()` and discarded after init.

This causes an `AttributeError: 'AIAgent' object has no attribute
'config'` on every session start when compression is enabled, caught by
the try/except and logged as a non-fatal DEBUG message.

Fix: store the loaded config as `self._config` in `__init__()` and
update the reference in the feasibility check to use `self._config`.
2026-04-19 10:40:26 -07:00
Teknium
bc4e2744c3 test: add tests for compression config_context_length passthrough
- Test that auxiliary.compression.context_length from config is forwarded
  to get_model_context_length (positive case)
- Test that invalid/non-integer config values are silently ignored
- Fix _make_agent() to set config=None (cherry-picked code reads self.config)
2026-04-12 17:52:34 -07:00
Harish Kukreja
b1f13a8c5f fix(agent): route compression aux through live session runtime 2026-04-12 01:34:52 -07:00
Teknium
dafe443beb
feat: warn at session start when compression model context is too small (#7894)
Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
2026-04-11 12:01:30 -07:00