Commit graph

9 commits

Author SHA1 Message Date
Teknium
ea01bdcebe
refactor(memory): remove flush_memories entirely (#15696)
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.

Problems with flush_memories:

- Pre-dates the background review loop.  It was the only memory-save
  path when introduced; the background review now fires every 10 user
  turns on CLI and gateway alike, which is far more frequent than
  compression or session reset ever triggered flush.
- Blocking and synchronous.  Pre-compression flush ran on the live agent
  before compression, blocking the user-visible response.
- Cache-breaking.  Flush built a temporary conversation prefix
  (system prompt + memory-only tool list) that diverged from the live
  conversation's cached prefix, invalidating prompt caching.  The
  gateway variant spawned a fresh AIAgent with its own clean prompt
  for each finalized session — still cache-breaking, just in a
  different process.
- Redundant.  Background review runs in the live conversation's
  session context, gets the same content, writes to the same memory
  store, and doesn't break the cache.  Everything flush_memories
  claimed to preserve is already covered.

What this removes:

- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
  (and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
  hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
  _check_compression_model_feasibility (headroom was only needed
  because flush dragged the full main-agent system prompt along;
  the compression summariser sends a single user-role prompt so
  new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
  flush-specific paths

What this renames (with read-time backcompat on sessions.json):

- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
  The session-expiry watcher still uses the flag to avoid re-running
  finalize/eviction on the same expired session; the new name
  reflects what it now actually gates.  from_dict() reads
  'expiry_finalized' first, falls back to the legacy 'memory_flushed'
  key so existing sessions.json files upgrade seamlessly.

Supersedes #15631 and #15638.

Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites.  No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
2026-04-25 08:21:14 -07:00
kshitijk4poor
d635e2df3f fix(compression): pass provider to context length resolver in feasibility check
_check_compression_model_feasibility calls get_model_context_length
without provider=, so Codex OAuth users get 1,050,000 (from models.dev
for 'openai') instead of the actual 272,000 limit. This happens because
_infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'),
skipping the Codex-specific resolution branch entirely.

Result: compression threshold set at 85% of 1.05M = 892K — conversations
never trigger compression, the context grows unbounded, and when gateway
hygiene eventually forces compression, the Codex endpoint drops the
oversized streaming request ('peer closed connection without sending
complete message body').

Fix: forward self.provider to get_model_context_length so provider-
specific resolution branches (Codex OAuth 272K, Copilot live /models,
Nous suffix-match) fire correctly.

Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).
2026-04-25 07:09:47 -07:00
Teknium
f92006ce1c
fix(compression): reserve system+tools headroom when aux binds threshold (#15631)
When the auxiliary compression model's context is smaller than the main
model's compression threshold, _check_compression_model_feasibility
auto-lowers the session threshold. Previously it set:

    new_threshold = aux_context

This let the raw message list grow to exactly aux_context tokens. But
compression and flush_memories actually send system_prompt + tool_schemas
+ messages to the aux model. With 50+ tools that overhead is 25-30K
tokens, so the full request overflowed aux with HTTP 400.

Subtract a headroom estimate from aux_context before setting the new
threshold: the actual tool-schema token count (from
estimate_request_tokens_rough) plus a 12K allowance for the system
prompt (not yet built at __init__ time) and flush-instruction overhead.
Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with
an unusually heavy tool schema.

This fixes the 'flush_memories overflow on busy toolsets' path that
Teknium flagged — where main and aux can be nominally the same model
but still 400 because the threshold left no room for the request
overhead. Same fix also protects the normal compression summarisation
request on the same binding aux.

Tests: two new regression tests cover the headroom reservation and the
MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new
(lower) threshold values now that empty-tools still produces a 12K
static headroom deduction.
2026-04-25 05:41:56 -07:00
Teknium
4f24db4258
fix(compression): enforce 64k floor on aux model + auto-correct threshold (#12898)
Context compression silently failed when the auxiliary compression model's
context window was smaller than the main model's compression threshold
(e.g. GLM-4.5-air at 131k paired with a 150k threshold).  The feasibility
check warned but the session kept running and compression attempts errored
out mid-conversation.

Two changes in _check_compression_model_feasibility():

1. Hard floor: if detected aux context < MINIMUM_CONTEXT_LENGTH (64k),
   raise ValueError so the session refuses to start.  Mirrors the existing
   main-model rejection at AIAgent.__init__ line 1600.  A compression model
   below 64k cannot summarise a full threshold-sized window.

2. Auto-correct: when aux context is >= 64k but below the computed
   threshold, lower the live compressor's threshold_tokens to aux_context
   (and update threshold_percent to match so later update_model() calls
   stay in sync).  Warning reworded to say what was done and how to
   persist the fix in config.yaml.

Only ValueError re-raises; other exceptions in the check remain swallowed
as non-fatal.
2026-04-20 00:56:04 -07:00
kshitijk4poor
7bd1a3a4b1 test(compression): cover real init feasibility override 2026-04-19 10:40:26 -07:00
kshitijk4poor
045b28733e fix(compression): resolve missing config attribute in feasibility check
Commit 4a9c3565 added a reference to `self.config` in
`_check_compression_model_feasibility()` to pass the user-configured
`auxiliary.compression.context_length` to `get_model_context_length()`.
However, `AIAgent` never stores the loaded config dict as an instance
attribute — the config is loaded into a local variable `_agent_cfg` in
`__init__()` and discarded after init.

This causes an `AttributeError: 'AIAgent' object has no attribute
'config'` on every session start when compression is enabled, caught by
the try/except and logged as a non-fatal DEBUG message.

Fix: store the loaded config as `self._config` in `__init__()` and
update the reference in the feasibility check to use `self._config`.
2026-04-19 10:40:26 -07:00
Teknium
bc4e2744c3 test: add tests for compression config_context_length passthrough
- Test that auxiliary.compression.context_length from config is forwarded
  to get_model_context_length (positive case)
- Test that invalid/non-integer config values are silently ignored
- Fix _make_agent() to set config=None (cherry-picked code reads self.config)
2026-04-12 17:52:34 -07:00
Harish Kukreja
b1f13a8c5f fix(agent): route compression aux through live session runtime 2026-04-12 01:34:52 -07:00
Teknium
dafe443beb
feat: warn at session start when compression model context is too small (#7894)
Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
2026-04-11 12:01:30 -07:00