feat(compression): make protect_first_n configurable

The number of head messages preserved verbatim across context compactions was previously hardcoded to 3 in AIAgent.__init__. Expose it as `compression.protect_first_n` in config, matching the existing `protect_last_n` pattern. Motivation: users who rely on rolling compaction for long-running sessions had the opening user/assistant exchange pinned as head forever, which doesn't always match how they want the session framed after many compactions. Lowering to 1 preserves the system prompt + first non-system message; lowering to 0 preserves only the system prompt and lets the entire first exchange age out naturally through the summary. Semantics: `protect_first_n` counts non-system head messages protected **in addition to** the system prompt, which is always implicitly protected when present. Same meaning across both code paths: protect_first_n=0 → system prompt only (or nothing if no system message) protect_first_n=2 → system prompt + first 2 non-system messages (default) This unifies the CLI path (which reads messages with the system prompt at position 0) and the gateway path (where the gateway /compress handler strips the system prompt before calling compress() — see gateway/run.py L9150-9154 on the parent fork). Previously these two paths disagreed: CLI path: protect_first_n=1 → protect system prompt only Gateway path: protect_first_n=1 → protect first USER turn forever In practice on long-running gateway sessions the old semantics pinned whatever stale aside happened to be the first user message, reinserting it into every compaction summary indefinitely. Default chosen as 2 (not 3) so that the effective protected head count remains 3 messages in the common case — assuming a system prompt is present, default protection becomes system + 2 non-system = 3 total, matching the pre-feature behaviour where `protect_first_n` was hardcoded to protect 3 messages total. Sessions without a system prompt will see a small behaviour change (2 protected head messages instead of 3), but this is the rare path and the new semantics make the system-prompt-present case the well-defined one. Changes: - agent/context_compressor.py: redefine protect_first_n as the count of non-system head messages protected beyond the implicit system-prompt guarantee; both paths converge. Constructor default updated to 2. - hermes_cli/config.py: add `compression.protect_first_n` default (2), matching the new semantics. `show_config` label tweaked to 'Protect first: N non-system head messages' for clarity. - run_agent.py: read protect_first_n from config; 0 is now valid (system prompt is always implicitly protected). - cli-config.yaml.example: document the new key and rationale. - tests/agent/test_context_compressor.py: cover default, override, the end-to-end `protect_first_n=0` and `protect_first_n=1` behaviour, the no-system-prompt (gateway) path, and the new shared-semantics regression test. Fixes #13751 Tested on Ubuntu 24.04.
2026-05-22 05:22:09 +00:00 · 2026-05-13 20:04:51 -04:00 · 2026-05-13 20:04:51 -04:00 · dee71a31e5
commit dee71a31e5
parent ffbc21100d
5 changed files with 149 additions and 11 deletions
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@ -731,8 +731,13 @@ DEFAULT_CONFIG = {
        "target_ratio": 0.20,         # fraction of threshold to preserve as recent tail
        "protect_last_n": 20,         # minimum recent messages to keep uncompressed
        "hygiene_hard_message_limit": 400,  # gateway session-hygiene force-compress threshold by message count
+        "protect_first_n": 2,         # non-system head messages always preserved beyond the system prompt
+                                      # verbatim, in ADDITION to the system prompt
+                                      # (which is always implicitly protected). Set to
+                                      # 0 for long-running rolling-compaction sessions
+                                      # where you want nothing pinned except the
+                                      # system prompt + rolling summary + recent tail.
    },
-
    # Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
    # cache_ttl must be "5m" or "1h" (Anthropic-supported tiers); other values are ignored.
    "prompt_caching": {
@ -4862,6 +4867,7 @@ def show_config():
        print(f"  Threshold:    {compression.get('threshold', 0.50) * 100:.0f}%")
        print(f"  Target ratio: {compression.get('target_ratio', 0.20) * 100:.0f}% of threshold preserved")
        print(f"  Protect last: {compression.get('protect_last_n', 20)} messages")
+        print(f"  Protect first: {compression.get('protect_first_n', 2)} non-system head messages")
        _aux_comp = config.get('auxiliary', {}).get('compression', {})
        _sm = _aux_comp.get('model', '') or '(auto)'
        print(f"  Model:        {_sm}")