Three state-loss bugs at the compression rotation boundary, fixed together
because they all live in the same ~80-line rotation block:
- #33618: a persistent /goal did not follow the rotation. load_goal does a
flat per-session lookup with no lineage walk, so a goal silently died when
compression minted a fresh child id. Added migrate_goal_to_session() and
call it after the child session is created (move-not-copy: the parent row
is archived as cleared so exactly one active goal row exists).
- #33906/#33907: if the child create_session raised (FK constraint,
contended write), the outer handler only warned and let the agent continue
on the NEW id — which has no row in state.db — producing an orphan session.
Now the rotation rolls agent.session_id back to the still-indexed parent
(reopening it) instead of stranding the conversation on a phantom id.
- #27633: the compaction-boundary on_session_start notification omitted the
platform kwarg, so context-engine plugins saw source=unknown for every
message after the boundary. Forward platform (matching the initial
session-start call in agent_init.py).
Co-authored-by: denisqq <21260182+denisqq@users.noreply.github.com>
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Teknium review: keeping one durable session id must NOT come at the cost of
destroying history. The prior in-place implementation used replace_messages,
which hard-DELETEs the pre-compaction turns (they also drop out of the FTS
index) — same id, but the original conversation is gone with no recovery path
and the summary becomes the only record. Rotation today is non-destructive
(the old session's full transcript survives under the old id); in-place must
match that durability contract, not weaken it.
Fix: compact in place by SOFT-ARCHIVING, reusing the existing messages.active
flag (the /undo soft-delete mechanic), instead of deleting:
- New SessionDB.archive_and_compact(session_id, compacted): in one atomic
write, UPDATE messages SET active=0 on the live turns, then insert the
compacted set as fresh active=1 rows. Nothing is deleted.
- The insert loop is extracted into a shared _insert_message_rows() helper so
archive_and_compact and replace_messages don't duplicate the 60-line
column/encoding block (extend-don't-duplicate).
- Agent in-place branch calls archive_and_compact instead of replace_messages.
Durability outcome (proven by test + E2E across repeated compactions):
- Live context load (get_messages_as_conversation / get_messages) filters
active=1, so a resume reloads ONLY the compacted set — compaction still
shrinks the live session.
- The pre-compaction turns stay on disk at active=0, recoverable via
get_messages(include_inactive=True) / restore_rewound.
- They remain FTS-searchable: the messages_fts* triggers index on INSERT and
remove on DELETE only — they do NOT key on active, and active=0 is a
content-preserving UPDATE. session_search still finds them.
- Verified across TWO successive compactions: the 1st compaction's originals
are still recoverable + searchable after the 2nd (answers the "no recovery
path after the next compaction" concern directly).
message_count now reflects the LIVE (active/compacted) count, matching the
live load. replace_messages keeps its DELETE semantics (still correct for
/retry, /undo) and gains a docstring note pointing compaction at the
non-destructive method.
Tests: test_in_place_keeps_same_session_id strengthened to assert the 8
seeded originals survive at active=0 alongside the 2 compacted rows AND stay
FTS-searchable. Mutation check: swapping archive_and_compact back to a hard
DELETE fails the test, so the non-destructive contract is bound. 285
hermes_state + in-place tests green; rotation/persistence/compress-command/cli
suites green; ruff clean.
Parallel 3-reviewer cleanup of the in-place compaction code. Findings applied:
- perf: in-place mode no longer pre-flushes current-turn messages. The flush
ran INSERTs that the immediately-following replace_messages(compressed)
DELETE+reinsert discarded -- pure wasted writes per compaction. The
current-turn tail survives via the compressor's compressed output
(protect_last_n), not the flush. Verified no data loss; rotation still
pre-flushes (its old session row is preserved, so the flush is real there).
- quality: hoist the two shared post-write steps (update_system_prompt +
_last_flushed_db_idx = 0) below the if/else -- they ran in both branches
against agent.session_id. Removes the easiest divergence bug.
- quality: compute the compaction-boundary locals (_old_sid, _is_boundary,
_boundary_parent) ONCE instead of recomputing locals().get('old_session_id')
and the "_old_sid or agent.session_id or ''" chain three times.
- quality: initialize compacted_in_place up front and assign
agent._last_compaction_in_place directly, dropping the fragile
locals().get('compacted_in_place') reflection.
- reuse: parse the in_place config flag with utils.is_truthy_value (the
project's canonical truthy coerce) instead of a hand-rolled
str().lower() in {...} (agent_init already imports from utils).
Dropped as false positives / out of scope: gateway getattr of agent internals
(established session_id pattern), dual result-dict carry (mirrors history_offset
etc.), stringly-typed "compression" (codebase-wide convention, no constant).
Behavior-preserving: 7 in-place tests (incl. 2 new flush-guard tests) + 26
rotation/boundary/persistence/command tests green; mutation check confirms the
durable-replace guard still binds (removing replace_messages fails the test);
ruff clean. Added test_in_place_skips_redundant_preflush /
test_rotation_still_preflushes to guard the perf change.
Review (Codex + 3-agent parallel) found the first cut of in-place mode was
incomplete: it only updated the system prompt, so the persisted transcript
stayed 'full history + summary' and the next turn/resume reloaded the full
history and immediately re-compacted (a loop), and every downstream layer
that keyed off session-id rotation silently no-op'd. The session_id was
doing double duty as the 'compaction happened' signal. This wires the whole
path so removing rotation is actually complete:
Agent (agent/conversation_compression.py):
- In-place now DURABLY replaces the transcript: replace_messages(session_id,
compressed) on the same row (the canonical store the gateway reloads from),
not just update_system_prompt. Resume reloads the compacted set; no loop.
- Reset flush identity/cursor (_last_flushed_db_idx=0, _flushed_db_message_ids
cleared) so next-turn appends diff against the compacted transcript.
- Expose a rotation-independent signal: agent._last_compaction_in_place, and
in_place=True on the session:compress event.
- Fire the compaction-boundary hooks (context-engine on_session_start, memory
manager on_session_switch, reason='compression') in BOTH modes — in-place
passes the same id as parent so DAG/buffer state still checkpoints. Without
this, memory/context plugins miss every in-place compaction.
Gateway auto-compress (gateway/run.py):
- Read agent._last_compaction_in_place; set history_offset=0 on rotation OR
in-place (both return the compacted set, so slicing past the pre-compaction
length would drop everything). Carry compacted_in_place in the result dict.
- No extra rewrite needed: the agent shares the gateway's SessionDB, so its
replace_messages already updated the canonical store load_transcript reads.
Manual /compress (gateway/slash_commands.py):
- The throwaway /compress agent has no _session_db, so rewrite_transcript is
the durable write. Previously gated behind 'if rotated:' which treated
'id unchanged' as the #44794 data-loss failure case and SKIPPED the rewrite
— making /compress a silent no-op in in-place mode. Now rewrites on rotated
OR in_place; the data-loss guard still fires only for the genuine
no-rotation-AND-not-in-place failure.
Hygiene auto-compress already writes _compressed to the same id
unconditionally (its agent has no _session_db, can't rotate) — correct for
in-place, no change.
Tests (tests/run_agent/test_in_place_compaction.py):
- Assert the DURABLE transcript IS the compacted set after reload
(get_messages_as_conversation == compacted), message_count==2, flush
identity reset, and the rotation-independent signal set on in-place /
unset on rotation. Rotation regression guard unchanged.
Verified: 64 tests green across in-place + rotation/persistence/boundary/
concurrent/failure-sync/command/cli suites; E2E both modes (durable replace,
gateway offset=0, rotation preserves old transcript); ruff clean. Still
default-off.
Context compression today rewrites the message list AND rotates the
session id — it ends the session, forks a parent_session_id child, and
renumbers the title (name -> name #2). That moving identity key is the
root cause of a whole bug cluster: /goal lost (#33618), pending response
lost at the split (#14238), orphan sessions (#33907), TUI sid desync
(#36777), FTS search gaps + duplicate sidebar entries (#45117), null
continuation cwd (#42228), and title-rename dead-ends (#48989). It also
forced a large defensive apparatus (compression lock, contextvar/env/
logging triple-sync, orphan finalization, gateway SessionEntry
re-propagation, tip projection) whose only job is surviving a
mid-conversation id change.
Add a compression.in_place config flag (default False during rollout).
When True, compaction rewrites the transcript and rebuilds the system
prompt but keeps the SAME session_id: no end_session, no child row, no
title renumber, no contextvar/logging re-sync, no memory/context-engine
session-switch. The conversation keeps one durable id for life, like
Claude Code / Codex. Compaction is lossy by design — the pre-compaction
transcript is summarized away, not archived.
The rotation path is unchanged when the flag is off (moved verbatim into
an else branch). Staged rollout: this PR ships the option behind a
default-off flag for live validation; a follow-up flips the default and
deletes the now-redundant rotation machinery, superseding the 14 open
band-aid PRs in this area.
- hermes_cli/config.py: add compression.in_place (default False), documented
- agent/agent_init.py: resolve the flag -> agent.compression_in_place
- agent/conversation_compression.py: branch compress_context() on the flag
- tests/run_agent/test_in_place_compaction.py: in-place invariants +
rotation regression guard + config default
The pre-flush of current-turn messages (#47202) runs in BOTH modes, so no
boundary data loss. Prompt-cache invariant preserved: the system-prompt
rebuild is the same single sanctioned invalidation that already happens
during compaction — no NEW invalidation. Message alternation preserved.
The image-too-large reactive shrink (try_shrink_image_parts_in_messages)
conflated two independent constraints: it always rejected a resize whose
re-encoded bytes were >= the original, even when the shrink was driven by a
PIXEL-DIMENSION cap (Anthropic many-image 2000px) rather than the byte budget.
Downscaled screenshot PNGs routinely re-encode LARGER in bytes, so the
dimension-correct result was discarded and the image left oversized -> the
provider re-rejected on retry and the session wedged forever.
Fix: track which constraint triggered the shrink (bytes vs dimension) and gate
the accept on the SAME axis.
* dimension path: accept the result as long as it is now within max_dimension,
regardless of byte size (verify via Pillow; fall back to the byte gate only
when the re-encode can't be decoded).
* bytes path: still require bytes to shrink, but ALSO re-check the per-side cap
when it's active — _resize_image_for_vision returns a best-effort, possibly
over-cap blob when it exhausts its halving budget on a very-high-aspect
image, so a byte-shrink alone can leave it over the dimension cap and
re-brick on retry.
Extend the unshrinkable-oversized guard to the pixel axis so a partial shrink
doesn't burn the one-shot retry.
Single shared agent path -> fixes CLI, TUI, and gateway alike.
Adds a real-Pillow runnable proof (repro_48013_image_shrink_brick.py) that
reproduces the issue's per-image table (bricks 3/5 before, passes 5/5 after)
plus unit invariants for the dimension and bytes accept/reject paths,
partial-progress accounting, and the bytes-path still-over-cap regression
surfaced by adversarial review.
Closes#48013
compress_context() rotates the session (end_session -> create_session)
mid-turn when auto-compress triggers, but never called
_flush_messages_to_session_db() first. Messages generated during the
current turn that hadn't been persisted to state.db were silently lost.
The same bug existed in cli.py:new_session() (/new command). Both paths
now flush un-persisted messages before ending the old session.
Auto-compression rewrites history mid-turn, which made long threads look
like they reset. Re-tag the gateway lifecycle status as compacting and
surface it in the desktop thread loading indicators.
Parse provider-reported image pixel ceilings so many-image Anthropic requests can recover by shrinking Retina screenshots below the stricter limit instead of retrying the same rejected payload.
When context compaction rotates agent.session_id, it updates the gateway/tools
session context (set_current_session_id -> HERMES_SESSION_ID env + ContextVar)
but never updates the separate logging session context. The [session_id] tag on
log lines comes from hermes_logging._session_context (set once per turn in
conversation_loop.py), so post-compaction log lines in the same turn carry the
STALE old id while the message/DB/gateway state carry the new one — breaking log
correlation exactly at the compaction boundary.
Call hermes_logging.set_session_context(agent.session_id) alongside the existing
set_current_session_id, guarded so a logging failure can't regress the routing
update. Logs-only; no runtime or caching impact.
Refs #34089
Anthropic enforces two independent ceilings per image:
1. 5 MB encoded byte size
2. 8000 px longest side
Hermes only guarded #1. A tall screenshot (e.g. 1200x12000 at 0.06 MB)
passes every byte check but fails the pixel check, returning a
non-retryable HTTP 400 that permanently bricks the conversation thread.
Fixes:
- error_classifier: add 'image dimensions exceed' pattern to
_IMAGE_TOO_LARGE_PATTERNS so the 400 is classified as image_too_large
and triggers the shrink/retry path instead of falling through to
non-retryable error.
- conversation_compression: check pixel dimensions (via Pillow) even
when byte size is under the 4 MB target. If max(dims) > 8000, force
shrink.
- vision_tools._resize_image_for_vision: add optional max_dimension param.
When set, images exceeding the pixel cap are downscaled even if they're
under the byte budget. The resize loop now checks both byte AND pixel
limits before accepting a candidate.
Closes#37677
Salvages 8 distinct fixes from a batch of PRs by @kyssta-exe, reapplied
onto current main (original branches were stale) with a few refinements.
- cron(jobs.py): load_jobs() validates top-level JSON shape — a bare
list auto-repairs into the {"jobs": [...]} dict; scalars/null raise a
clear RuntimeError instead of an uncaught AttributeError that took
down the whole cron subsystem (#37065, closes#36867).
- web(web_server.py): close the per-action log file handle after Popen
so the parent stops leaking one fd per spawned action (#36843).
- web(web_server.py): DELETE /api/env returns 400 for invalid key names
instead of a misleading 500, mirroring PUT /api/env (#36840).
- gateway(gateway.py): read /proc/<pid>/cmdline inside a with-block so
the fd is released immediately instead of relying on GC (#36804).
- web-tools(web_tools.py): include "xai" in check_web_api_key() so a
configured X.AI web backend reports as available (#36802).
- compression(conversation_compression.py): mark the feasibility check
done only after it completes, and default the gate to "not checked"
if the attribute is missing (#36803).
- completion(completion.py): replace `ls` with directory globbing in the
generated bash/zsh/fish profile listers — handles names with spaces
and skips non-directory entries (#36806).
- terminal-tool(terminal_tool.py): drop a duplicate `import threading`
(#36808).
- claw(claw.py): the migrate recommendation now points at the real
`hermes gateway stop` command instead of the non-existent
`hermes stop` (#36795, #36796, closes#36771).
- tests: guard against a leaked HERMES_CRON_SESSION breaking gateway
approval tests — add it to the hermetic conftest unset list (root
cause, protects every test) and pop it in the affected test's
setup_method (#36796).
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Resize vision tool-result images down to a 4 MB embed cap at load time,
not just at the 20 MB hard ceiling. A 5-20 MB image previously sailed
through the native fast path and got baked into conversation history,
where Anthropic's 5 MB per-image base64 limit rejected every subsequent
turn with a 400 — and because history is immutable, retries could never
clear it, permanently wedging the session.
Also harden the reactive shrink-recovery: it now returns False (don't
retry) when any oversized image part can't be brought under target, so
the single retry isn't burned re-sending a payload that will fail
identically. Previously it returned True after shrinking *any* part,
even when the actual oversized culprit survived.
A process running mismatched module versions — conversation_compression.py
re-imported with the post-#34351 lock code while a long-lived
hermes_state.SessionDB stays bound to the pre-#34351 class in memory — has
the try_acquire_compression_lock call site but not the method. The
AttributeError it raises is NOT a sqlite3.Error, so the method's own
fail-open guard never runs; the exception escapes to the outer agent loop,
which prints the error and retries. Compression never succeeds, the token
count never drops, and the loop re-triggers compaction forever (the
'API call #47/#48/#49 ... has no attribute try_acquire_compression_lock'
spin a user hit after an update).
Wrap the lock acquire so any unexpected exception fails OPEN: skip locking
and proceed with compression. Skipping the lock risks a rare
concurrent-compression session fork; an infinite no-progress loop that never
compresses at all is strictly worse. The remediation hint in the log points
at the real fix (restart / hermes update to resync the stale module).
Also guards get_compression_lock_holder against the same skew.
Adds a regression test simulating the version skew (real SessionDB wrapped
so only the lock methods raise AttributeError) — asserts _compress_context
proceeds and rotates instead of raising.
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.
- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
unused in their defining module are kept with explicit # noqa:
F401 (gateway/run.py load_dotenv; run_agent re-exports from
agent.message_sanitization, agent.context_compressor,
agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
selected); this is a one-time cleanup, not a config change
Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
module still resolves
* fix(compression): prevent session-id fork from concurrent compressions
When two AIAgent instances share the same session_id (most commonly the
parent-turn agent and its background-review fork, which inherits
session_id verbatim via background_review.py L451), both can call
compress_context() on overlapping snapshots of the same conversation.
Each ends the parent and creates its own NEW child session in state.db,
both parented to the same old id. The gateway SessionEntry only catches
one rotation; the other becomes an orphan that silently accumulates
writes — Damien's incident shape (parent 20260527_234659_e65f0e → two
children, only one visible).
Adds a state.db-backed per-session compression lock. Acquired before
the rotation in conversation_compression.compress_context(); on
failure, the caller returns messages unchanged so the auto-compress
retry loop stops cleanly. TTL (5min default) reclaims locks abandoned
by crashed compressors. Lock holder identity (pid:tid:agent:nonce) is
preserved for diagnostics via get_compression_lock_holder().
Schema bumped 13 -> 14 to track the new compression_locks table.
Reconciled additively via the existing declarative-column pattern;
no data migration needed for existing DBs.
Regression test reproduces Damien's shape: two threads racing
_compress_context on a shared parent_sid. Without the lock the test
deterministically produces 2 child sessions; with the lock, exactly 1.
Covers all six compression entry points (preflight in conversation_loop,
mid-turn fallback, hygiene compression in gateway, /compact, CLI
/compress, TUI /compress). ACP /compress was already protected by
nulling out _session_db before its compress call.
* ci: trigger rerun (transient GitHub API rate limit on CodeQL workflow)
Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373
into a minimal generic host contract that external context engine plugins
(e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that
duplicated existing infrastructure or had marginal value.
Five concrete changes:
1. `_transition_context_engine_session()` on AIAgent — generic lifecycle
helper that fires on_session_end → on_session_reset → on_session_start
→ optional carry_over_new_session_context. Engines implement only the
hooks they need; missing hooks are skipped. Built-in compressor keeps
its existing reset-only behavior because callers default to no
metadata. `reset_session_state()` now optionally accepts
previous_messages / old_session_id / carry_over_context and delegates
to the transition helper when provided. (#16453)
2. `conversation_id` passed to `on_session_start()` — both the
agent-init call site and the compression-boundary call site now
forward `self._gateway_session_key` so plugin engines have a stable
conversation identity that survives session_id rotation (compression
splits, /new, resume). The key already existed on AIAgent; it just
wasn't reaching engines. (#16453)
3. Canonical cache buckets forwarded to engines — the usage dict passed
to `update_from_response()` now includes input_tokens, output_tokens,
cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of
the legacy prompt/completion/total keys. Engines can make decisions on
cache-hit ratios and reasoning costs instead of only aggregates. ABC
docstring updated. (#17453)
4. Plugin-registered context engines visible in the picker —
`_discover_context_engines()` in plugins_cmd.py now also includes
engines registered via `ctx.register_context_engine()` from plugin
manifests, deduplicating by name so repo-shipped descriptions win on
collision. (#16451)
5. `_EngineCollector.register_command()` — context engines using the
standard `register(ctx)` pattern can now expose slash commands (e.g.
`/lcm`). Routes to the global plugin command registry with the same
conflict-rejection policy regular plugins use (no shadowing built-ins,
no clobbering other plugins). Previously these calls hit a no-op and
the slash commands silently never appeared. (#17600)
Dropped from the original 5 PRs:
- Compression boundary signal (`boundary_reason="compression"`) from
#16453 — already on main at `agent/conversation_compression.py:412-424`,
landed via the bg-review extraction.
- `discover_plugins()` before fallback in run_agent.py from #16451 —
redundant: `get_plugin_context_engine()` already routes through
`_ensure_plugins_discovered()` which is idempotent.
- Runtime identity diagnostics method + helpers from #13373 (+251 LOC) —
operators can already read engine state via `engine.get_status()`;
the diagnostics view added marginal value relative to its surface area.
- The 553-LOC slash-command machinery from #17600 — replaced with a
20-LOC `register_command` method on the collector that reuses the
existing plugin command registry instead of building a parallel one.
Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs
~1,176 LOC across the original 5 PRs.
Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com>
Closes#16453.
Closes#17453.
Closes#16451.
Closes#17600.
Closes#13373.
Related: stephenschoettler/hermes-lcm#68.
`AIAgent.__init__` was eagerly calling
`_check_compression_model_feasibility()` which probes the auxiliary
provider chain and runs `get_model_context_length()` (potentially
network-bound) to decide whether the configured auxiliary model can
fit a full compression-threshold window. That cost ~440ms cold on
every agent construction.
Most `chat -q` invocations finish in 1-5 seconds and never accumulate
enough context to trip the compression threshold, so the feasibility
check is pure overhead. The result is also only consumed when
compression actually fires (the function adjusts the live threshold
downward if the aux model can't fit; absent that mutation, the gate
in `conversation_loop.py:442` would never fire anyway).
Defer to first `compress_context()` call via
`agent._compression_feasibility_checked` sentinel. Runs at most once
per agent lifetime, just before the first compression pass. The
warning storage (`_compression_warning`) and gateway replay
machinery is unchanged — it still emits to status_callback on the
first turn that actually needs compression.
E2E timing (chat -q 'hi', 3 runs each):
BEFORE AFTER delta
median wall 2.03s 1.86s -8% (-169ms)
min wall 1.92s 1.63s -15% (-293ms)
Real cold-start observation (synthetic 31-turn agent loop): identical
behavior since feasibility check fires once on first compression and
caches. No semantic difference for sessions that DO compress.
UX trade-off: users with broken auxiliary-provider config no longer
see the warning at session start. They see it when compression first
fires — which is exactly when it matters. For users with working
config (the vast majority), the warning never fires anyway, so the
deferral is invisible.
Tests:
- tests/run_agent/test_compression_feasibility.py — 16/16 pass
(the one test that asserted call-at-init was updated to drive the
lazy check explicitly via agent._check_compression_model_feasibility())
- Live tmux session: 2-turn conversation + tool call completes clean,
zero errors in agent.log
When auxiliary compression's summary generation returns None (aux model
errored, returned non-JSON, timed out, etc.) the compressor previously
still dropped every middle message between compress_start..compress_end
and replaced them with a static 'Summary generation was unavailable'
placeholder. The session kept going but the user silently lost N turns
of context for nothing.
New behavior: on summary failure, compress() aborts entirely — returns
the input messages unchanged and sets _last_compress_aborted=True. The
existing _summary_failure_cooldown_until gate (30-60s) keeps the aux
model from being burned on every turn. Auto-compress callers detect
the no-op (len(after) == len(before)) and stop looping. The chat is
'frozen' at its current size until the next /compress or /new.
Manual /compress (CLI + gateway) now passes force=True which clears
the cooldown so users can retry immediately after an auto-abort. If
the manual retry also fails, the user gets a visible warning telling
them nothing was dropped and how to retry.
- agent/context_compressor.py: compress() gains force= kwarg; failure
branch sets _last_compress_aborted and returns messages unchanged
instead of inserting placeholder.
- run_agent.py: _compress_context() detects abort, surfaces warning,
skips session-rotation entirely, returns messages unchanged.
- cli.py + gateway/run.py: manual /compress paths pass force=True.
- gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted
and emit the new 'Compression aborted' warning (gateway.compress.aborted)
instead of the old 'N historical messages were removed' message.
- locales/*.yaml: new gateway.compress.aborted key in all 16 locales.
- tests: updated to assert the abort contract (messages preserved,
compression_count not incremented, abort flag set, no placeholder
leaked). New test_force_true_bypasses_failure_cooldown covers the
manual-retry path.
Original commit 97a32afdc by helix4u targeted _check_compression_model_feasibility
in pre-refactor run_agent.py. The function body now lives in
agent/conversation_compression.py — re-applied the configured-but-unavailable
provider message there.
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>